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ABSTRACT 

A  group  membership  protocol  ensures  agreement  and  consistent  commit  actions 
among  group  members  to  maintain  a  sequence  of  identical  group  views  in  spite  of 
continuous  changes,  either  voluntary  or  otherwise,  in  processors'  membership  status. 
In  asynchronous  distributed  environments,  such  consistency  among  group  views  must 
be  guaranteed  using  messages  over  a  network  which  does  not  bound  message  delivery 
times.  Assuming  a  network  that  only  provides  a  reliable,  FIFO  channel  between 
any  pair  of  processors,  one  approach  to  designing  such  a  protocol  is  to  centralize  the 
responsibility  to  detect  changes,  ensure  agreement,  and  commit  them  consistently  in 
a  single  manager  process.  This  approach  is  complicated  by  the  fact  that  a  protocol  to 
elect  a  new  manager  with  a  consistent  membership  proposal  must  be  executed  when 
the  manager  itself  fails.  In  this  thesis,  a  membership  protocol  based  on  ordering 
of  group  members  in  a  logical  ring  that  eliminates  the  need  for  such  centralized 
responsibility  is  presented.  Agreement  and  commit  actions  are  token-based  and  the 
protocol  ensures  that  no  tokens  are  lost  or  duplicated  due  to  changes  in  membership. 
It  is  able  to  process  continuous  changes  to  the  membership,  does  not  depend  upon  any 
majority-based  decisions,  and  processes  joins  and  departures  identically.  The  cost  of 
committing  a  change  is  always  2n  point-to-point  messages  over  FIFO  channels  where 
n  is  the  group  size.  The  protocol  correctness  is  proven  in  a  formal  framework.  The 
implementation  details  for  the  protocol  to  execute  on  a  network  of  SUN  workstations 
are  presented.  Detailed  examples  of  the  behavior  of  the  protocol  for  various  sequences 
of  changes  to  group  membership  is  presented.  The  programs  for  various  client-server 
communication  patterns  used  for  interfacing  various  functions  are  also  presented. 


111 


1)1' 

C.I 


TABLE  OF  CONTENTS 

I.  INTRODUCTION    1 

A.  BACKGROUND 1 

B.  OBJECTIVES  OF  THE  STUDY 2 

C.  THESIS  ORGANIZATION 3 

II.  EXISTING  APPROACHES 4 

A.  GROUP  MEMBERSHIP  PROBLEM 4 

1.  Importance 4 

2.  System  Classification 5 

B.  SYNCHRONOUS  SYSTEMS 6 

1.  Periodic  Broadcast  Protocol 6 

2.  Attendance  List  Protocol 7 

3.  Robust  Group  Membership  Algorithm     7 

C.  ASYNCHRONOUS  SYSTEMS 8 

1.  Failure  Detection  and  Notification  Protocol 8 

2.  Protocol  Based  on  Total  Message  Ordering 10 

3.  Protocol  Based  on  Rotating  Token  List 11 

4.  ISIS  Approach 12 

III.  GROUP  MEMBERSHIP  PROTOCOL 14 

A.     ASSUMPTIONS,  OVERVIEW,  AND  DEFINITIONS 15 

1.  Overview     16 

a.      Processing  of  Individual  Changes 18 

2.  Definitions 19 

a.      Group  Membership  Problem  Definition 19 

iv 


*rs  biunnn  * 


NAVAL  POSTGRADUATE  SCHOOL 

IFORNI A  93943-5002 


b.  Logical  Ring 20 

c.  Tokens 21 

d.  Neighbor  and  Host  Computation 22 

B.  THE  MEMBERSHIP  PROTOCOL    23 

1.  Status  Change  Detection  and  Agreement  Initiation 24 

2.  The  Agreement  Phase 25 

3.  The  Commit  Phase 27 

a.  Effects  of  a  Commit  Action 27 

b.  Ensuring  an  Identical  Sequence  of  Commits 28 

C.  CORRECTNESS  OF  THE  PROTOCOL 29 

IV.   IMPLEMENTATION  OF  THE  PROTOCOL  35 

A.  PROTOCOL  SOFTWARE  DESIGN 35 

1.  Functions  in  the  Protocol     35 

2.  Subcomponents  of  MP 37 

B.  DATA  STRUCTURE  DEFINITIONS 37 

C.  PROCESS  SPECIFICATIONS 46 

1.  FIFO-Channel-Layer 46 

2.  Initiate-Departure 49 

3.  Initiate- Join 53 

4.  Agreement  Process 54 

5.  Commit  Process 56 

6.  TokenPool  Manager 58 

7.  StatusTable  Manager 61 

8.  Group  View  Manager 61 

9.  Join  Initial 62 

D.  IMPLEMENTATION  ON  UNIX  MACHINES 63 

v 


1.  Pipes 64 

2.  FIFOs 65 

3.  Message  Queues 66 

4.  Sockets 68 

5.  Transport  Layer  Interface 69 

V.  AN  EXAMPLE 71 

A.  INITIAL  CONDITIONS     71 

B.  EXPLANATION  OF  THE  EXAMPLE    72 

1.  Failure  of  a  Single  Member 72 

2.  Join  of  a  Single  Member 75 

3.  Multiple  Failures  and  Joins     75 

VI.  CONCLUSIONS  AND  FUTURE  DIRECTIONS 81 

A.  CONCLUSIONS 81 

B.  FUTURE  WORK 81 

APPENDIX  A             82 

A.  GROUPVIEW  SERVER .  .  82 

B.  TOKENPOOL  SERVER 90 

C.  STATUS  TABLE  SERVER 97 

D.  COMMIT  PROCESS  SERVER 105 

REFERENCES 113 

INITIAL  DISTRIBUTION  LIST 115 


VI 


LIST  OF  TABLES 

4.1  PROCESSES  AND  THEIR  FUNCTIONS 39 

4.2  DIFFERENT  ACTION  ORIENTED  DATA  STRUCTURES 40 

4.3  DIFFERENT  TOKENS  WITH  THEIR  TOKEN-TYPES 45 

5.1  SNAPSHOT  OF  INITIAL  CONDITION 73 

5.2  SNAPSHOT  AT  THE  END  OF  AGREEMENT  PHASE 74 

5.3  SNAPSHOT  WHEN  ONLY  p3  HAS  COMMITTED 74 

5.4  SNAPSHOT  WHEN  ALL  MEMBERS  HAVE  COMMITTED  p2  .  .  .  74 

5.5  SNAPSHOT  BEFORE  INITIATING  AGREEMENT  FOR  A  JOIN    .  76 

5.6  SNAPSHOT  WHEN  p6  IS  COMMITTED  AT  THE  HOST  Po    .  .  .  .  77 

5.7  SNAPSHOT  WHEN  ALL  MEMBERS  HAVE  COMMITTED  p6  .  .  .  77 

5.8  SNAPSHOT  SHOWING  MULTIPLE  AGREE  TOKENS 79 

5.9  GROUPVIEW  FOR  SUCCESSIVE  VIEW  NUMBERS 80 


VII 


LIST  OF  FIGURES 

3.1  A  Logical  ring 17 

3.2  Algorithm  for  monitoring  and  agreement  initiation 25 

3.3  Algorithm  for  reporting  the  status 25 

3.4  Algorithm  to  initiate  a  join 26 

3.5  Protocol  for  agreement  tokens 30 

3.6  Protocol  for  committing  a  change 31 

3.7  Protocol  to  process  a  commit  token 31 

4.1  Topmost  view  of  MP  interactions 36 

4.2  MP  process  interactions     38 

4.3  Action-type  message  structure 40 

4.4  GroupView  message  structure 41 

4.5  Neighbor- Address  structure     41 

4.6  StatusTable  message  structure 41 

4.7  Token  structure 42 

4.8  TokenPool  message  structure 42 

4.9  GroupView     43 

4.10  Status-Monitoring  message  structure 44 

4.11  Status-Table 44 

4.12  TokenPool      45 

4.13  FIFO  channel  process 47 

4.14  Send  process 48 

4.15  Receive  process 49 

4.16  Initiate-Departure  process 50 

viii 


4.17  Failure- Monitor 52 

4.18  Initiate- Agreement  process 53 

4.19  Timing  process 54 

4.20  Initiate- Join  process     55 

4.21  Agreement  process 57 

4.22  Commit  process 59 

4.23  Token  Pool- Manager  process 60 

4.24  StatusTable-Manager  process 62 

4.25  GroupView-Manager  process 63 

4.26  Message  queue  structure 67 


IX 


ACKNOWLEDGMENT 

I  would  like  to  place  on  record  my  sincere  thanks  to  my  thesis  advisor  Prof. 
Sridhar  Shukla  for  all  the  help  he  has  given  me  in  the  execution  of  my  thesis  research. 
I  would  also  like  to  thank  Director  (Training)  Directorate  of  Training  and  Sponsored 
Research  for  giving  me  an  oppurtunity  to  take  up  this  course.  I  am  also  deeply  indebt- 
ted  to  Director,  Defense  Electronics  Research  Laboratory  for  all  the  encouragement 
he  has  given  me  throughout  my  career.  I  would  like  thank  my  wife  Lalitha,  for  being 
patient  with  me  and  helping  me  a  great  deal  to  complete  this  course  successfully. 


x 


I.  INTRODUCTION 

A.      BACKGROUND 

Distributed  computing  systems  are  becoming  increasingly  popular  to  tackle 
large  computational  problems  associated  with  large  defense  systems.  A  distributed 
system  is  a  system  with  many  processing  elements  and  storage  devices  connected  to- 
gether by  a  network.  Fault  tolerance  and  parallelism  are  the  two  important  properties 
of  distributed  systems  [Ref.  DSS1].  The  fault  tolerance  capability  of  a  distributed 
system  is  due  to  the  replication  of  data  and  programs  among  several  processing  el- 
ements. When  one  processor  fails,  another  can  take  over  the  work  and  complete  it. 
The  presence  of  several  processing  elements  makes  it  possible  to  divide  a  program 
into  several  segments  to  be  executed  in  parallel,  resulting  in  a  speed-up. 

Exploiting  parallelism  or  achieving  fault  tolerance  require  communication  among 
processors.  In  fact,  multiple  processes  in  one  processor  have  to  communicate  with 
multiple  processes  in  other  processors.  Conventional  operating  systems  provide  a  net- 
work level  interface  for  this  type  of  communication.  In  a  distributed  system,  where 
such  communication  is  basic  to  programs,  it  becomes  a  very  complex  task  to  manage 
communication  between  processes  and  write  correct,  efficient  programs.  Thus,  there 
is  a  need  to  provide  higher  level  communication  primitives  to  make  writing  distributed 
programs  less  difficult.  These  primitives  remove  the  burden  of  managing  interprocess 
communication  from  the  application  developer.  The  important  features  required  of 
these  primitives  are  reliability  and  minimal  communication  delay. 

Use  of  process  groups  is  one  of  the  approaches  to  write  distributed  applications 
[Ref.    B+90].    It  is  based  on  reliable  communication  and  simplifies  the  writing  of 


applications.  Process  groups  occur  when  groups  of  processes  cooperate  to  perform  a 
task,  share  memory,  subdivide  computation,  and  so  forth.  For  example,  there  could 
be  a  process  group  where  the  leader  performs  the  task  of  searching  the  database  and, 
in  the  event  of  its  failure,  some  other  member  takes  over  and  finishes  the  task.  In 
this  example,  process  groups  are  used  to  provide  a  fault  tolerant  service. 

The  main  features  of  the  process  group  approach  are,  failure  atomicity,  for 
multicasts,  and  ^membership  atomicity,  for  failures  as  well  as  joins  to  a  group.  A 
failure  is  atomic  for  a  multicast  if  all  members  receive  a  multicast  or  none  of  them 
receive  it.  Membership  atomicity  means  that  if  a  member  joins  or  leaves  the  group, 
every  one  agrees  on  it  or  no  one  agrees  to  it. 

This  sort  of  a  guarantee  provided  by  reliable  primitives  leads  to  the  requirement 
of  all  members  in  a  group  commiting  to  a  failure  or  a  joining  of  a  member  in  the  same 
sequence  such  that  there  is  a  consistency  in  the  membership  changes  to  the  group  at 
all  member  sites.  The  Group  Membership  Problem  (GMP)  is  the  problem  of  agreeing 
on  the  membership  of  a  group  and  disseminating  that  information  consistently  among 
the  members  of  the  group. 

B.      OBJECTIVES  OF  THE  STUDY 

In  this  thesis,  a  decentralized  mechanism  for  providing  a  consistent  group  view 
at  all  member  sites  is  presented.  This  approach  assumes  that  the  members  are  inter- 
connected over  a  network  of  reliable  FIFO  channels.  The  GMP  is  solved  by  requiring 
that  each  increment  of  the  view  number  be  assosiated  with  successive  views  that  dif- 
fer by  only  one  member.  This  approach  also  assumes  that  the  only  failure  mode  is 
fail-stop  [Ref.  Cri88]  and  the  processors  do  not  behave  maliciously.  This  approach 
uses  tokens  for  achieving  agreement  and  commit  actions. 


Unlike  all  the  other  approaches  which  are  described  in  the  next  chapter,  this 
approach  is  a  distributed  approach  and  hence  eliminates  centralizing  the  responsibility 
of  ensuring  consistency  of  view  changes.  This  approach  scales  linearly  with  respect  to 
the  number  of  messages  as  the  number  of  members  increases.  It  also  guarantees  that 
the  protocol  is  non-blocking  and  members  can  leave  and  join  the  group  continuously. 

C.     THESIS  ORGANIZATION 

This  thesis  has  six  chapters.  The  second  chapter  deals  with  the  earlier  mem- 
bership protocols,  their  merits,  and  problems. 

In  Chapter  III,  the  decentralized  membership  protocol  is  discussed  and  formal 
algorithms  are  given.  The  implementation  details,  details  of  the  process  specifica- 
tions, and  the  data  structures  used  are  given  in  Chapter  IV.  Since  the  implementation 
involves  a  lot  of  interaction  among  various  processes,  the  system  calls  used  in  the 
implementation  are  also  discussed  in  this  chapter.  Chapter  V  gives  an  extended 
example  of  the  working  of  the  protocol.  Chapter  VI  deals  with  the  future  work  in 
this  area.  Appendix  A  gives  a  listing  of  the  programs  developed. 


II.  EXISTING  APPROACHES 

This  chapter  deals  with  the  existing  approaches  to  the  Group  Membership  Prob- 
lem (GMP).  It  first  describes  GMP  and  how  it  is  useful  in  Electronic  warfare  appli- 
cations. It  then  describes,  in  detail,  various  approaches  to  GMP. 

A.      GROUP  MEMBERSHIP  PROBLEM 

The  task  of  managing  a  distributed  computation  containing  replicated  processes 
is  best  formulated  in  terms  of  management  of  process  groups,  where  each  group 
represents  a  fault-tolerant  process.  The  process  group's  membership  changes  when  its 
processes  fail  (they  are  removed),  when  they  recover  (they  are  re-instated),  when  new 
processes  join,  and  when  processes  leave  voluntarily.  The  process  group's  members 
query  the  membership  view  and  are  able  to  take  actions  based  on  the  membership 
view.  Agreement  on  the  membership  of  a  group  of  processes  is  a  must  to  avoid 
inconsistency  problems.  This  problem  of  agreement  on  the  membership  of  the  group 
is  defined  as  the  Group  Membership  Problem. 

1.      Importance 

For  example,  let  us  consider  an  Electronic  Warfare  system  that  is  required 
to  perform  a  complicated  task  of  countermeasures  initiation  based  on  threat  assess- 
ment. Threat  assessment  is  based  on  several  parameters  like  the  type  of  enemy 
platform,  threat  priority,  and  the  most  effective  countermeasures  possible.  These 
parameters  are  evaluated  by  a  group  of  processors  which  interact  with  one  another 
through  messages.  The  messages  could  be  broadcasts,  i.e.,  the  same  message  is  sent 
to  different  processors  for  a  collective  action.  It  is  necessary  that  all  operational 
processors  agree  on  the  failures  to  take  correct  and  consistent  corrective  action. 


Let  us  consider  that  a  processor  A  sends  messages  to  two  other  processors, 
B  and  C,  for  a  particular  sequence  of  actions  to  take  place.  If  A  fails  after  sending  the 
message  to  B,  and  before  sending  the  message  to  C,  it  is  possible  that  B  does  not  know 
of  the  failure  of  A.  In  this  case,  the  action  taken  by  B  and  C  could  be  inconsistent 
and  erroneous.  If  both,  B  and  C  knew  of  the  failure  of  A  then  they  would  be  able 
to  recover  from  the  failure  based  on  consistent  information.  Similar  examples  can  be 
found  in  the  database  field  and  in  real  time  applications.  [Ref.  CT90] 
2.      System  Classification 

Distributed  systems  can  be  classified  into  synchronous  and  asynchronous 
systems.  In  synchronous  systems,  all  events  are  deemed  to  happen  one  at  a  time. 
In  this  type  of  system  the  groupview  is  frozen  at  the  time  of  message  sending.  All 
messages  wait  till  the  changes  to  group  membership  are  complete  and  all  membership 
changes  wait  till  all  pending  messages  are  sent.  There  is  a  close  synchronization  in  the 
clocks  of  the  interacting  processes  and  there  is  a  known  upperbound  on  the  message 
delivery  time. 

In  an  asynchronous  system,  there  is  no  relationship  between  the  clocks  of 
interacting  processors.  The  time  for  message  delivery  is  unknown.  It  is  not  possible 
to  be  certain  of  the  failure  of  any  process,  since  there  is  no  upper  bound  on  the  time 
a  message  takes  to  be  delivered.  Therfore,  processes  are  only  perceived  to  have  failed 
and  crashes  are  indistinguishable  from  communication  delays.  It  is  necessary  that 
processes  perceived  to  have  failed  be  removed  from  the  group.  If  not,  it  is  impossible 
to  reach  a  consensus  on  the  failure  of  a  processor  [Ref.  FLP85].  In  the  rest  of  the 
chapter,  several  existing  approaches  to  group  membership  problem  are  discussed. 


B.      SYNCHRONOUS  SYSTEMS 

In  a  synchronous  distributed  system  the  processor  clocks  are  synchronized.  This 
clock  synchronization  leads  to  availability  of  a  global  time  at  all  processors.  There 
is  a  known  upper  bound  on  the  message  delivery  time  and  this  leads  to  detection 
of  a  failure  in  a  bounded  time.  Cristain  has  given  one  method  for  solving  the  GMP 
[Ref.  Cri88]  in  this  setting.  Another  approach  is  given  by  Ezhilselvan  and  Lemos 
[Ref.  EzLe90].  These  are  discussed  in  greater  detail  in  the  subsequent  paragraphs. 
1.      Periodic  Broadcast  Protocol 

This  protocol,  developed  by  Cristain  [Ref.  Cri88],  assumes  a  synchronous 
communication  network  which  provides  a  message  diffusion  service  and  a  bounded 
delay  on  the  message  delivery  time.  These  assumptions  lead  to  the  premise  that  two 
processors  are  unable  to  communicate  only  if  one  of  them  has  failed.  It  also  assumes  an 
atomic  broadcast  tolerant  of  performance  failures,  i.e.,  failure  of  the  communication 
link  to  deliver  messages  within  a  known  bounded  time. 

In  this  protocol,  all  members  periodically  send  messages  to  one  another 
about  their  presence.  Since  an  atomic  broadcast  is  assumed,  all  operational  processors 
receive  this  message.  If  a  processor  fails,  then  it  is  not  able  to  send  the  periodic 
message  and  all  the  other  members  know  of  its  failure  within  a  bounded  delay  and 
remove  the  failed  member  from  their  view.  Each  of  these  periodic  messages  contains 
the  clock  time  associated  with  it  and  is  used  to  synchronize  the  clocks.  The  renewal 
time  for  the  next  broadcast  is  a  constant  time  added  to  the  synchronized  time. 

When  a  new  processor  wants  to  join,  it  sends  a  new-group  message  to  all 
the  members  with  the  group  id.  All  the  members  of  that  group  respond  to  the  joining 
processor  by  sending  the  present  message  to  it.  These  messages  are  used  by  the  joining 
processor  to  create  its  view  of  the  group  at  that  point.  The  time  of  renewal  is  now 
changed  to  take  into  account  the  member  joining. 
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This  protocol  has  the  advantage  that  failures  are  detected  in  the  quickest 
possible  time.  If  a  processor  failed  immediately  after  signaling  its  presence,  its  failure 
is  noticed  within  time  equal  to  the  message  renewal  time  added  to  the  maximum 
message  delivery  time.  The  drawback  of  this  protocol  is  that  it  requires  n  atomic 
broadcasts  every  group  renewal  time,  where  n  is  the  number  of  members  in  the  group. 

2.  Attendance  List  Protocol 

This  protocol  was  also  developed  by  Cristain  [Ref.  Cri88]  and  assumes  the 
same  type  of  broadcast  facility  and  communication  network  as  the  previous  one.  The 
joining  of  the  new  members  is  also  handled  in  the  same  way.  In  this  protocol,  the 
membership  is  checked  by  sending  a  datagram  to  all  the  members  some  time  after  a 
join  is  completed.  This  datagram  reaches  all  members  within  a  bounded  time  and  all 
members  check  to  see  if  they  received  it  within  the  right  time.  If  there  is  a  failure, 
at  least  one  of  them  does  not  receive  the  list  and  it  issues  a  new  join  phase.  In 
this  phase,  the  member  which  has  failed  does  not  participate  and  his  membership 
is  removed  from  the  group  by  other  members.  This  protocol  has  a  reduced  message 
overhead  in  the  absence  of  changes  and  is  more  efficient  than  the  periodic  broadcast 
protocol.  This  reduced  overhead  leads  to  an  increase  in  the  failure  detection  time. 

3.  Robust  Group  Membership  Algorithm 

This  algorithm,  proposed  by  Ezhilselvan  and  Lemos  [Ref.  EzLe90],  is  de- 
veloped for  real-time  systems.  It  assumes  a  failure  free  broadcast  network  which 
preserves  the  order  of  messages  and  has  a  bounded  interval  on  message  delivery  time. 
It  also  assumes  that  all  processors  access  the  medium  in  a  known  order  and  can  detect 
the  absence  of  processors  broadcast  in  a  bounded  time  interval.  This  algorithm  deals 
with  send-failure,  receive-failure  and  crash-failure. 

All  processors  maintain  a  vector,  denoting  the  status  of  each  member, 
which  is  transmitted  periodically  to  all  members,  in  a  particular  time  slot  known  to 


all  members.  During  each  cycle,  all  processors  exchange  this  information  with  one 
another.  In  this  way,  all  processors  have  the  same  information  and  each  of  them, 
by  executing  the  same  algorithm,  arrives  at  the  same  results.  The  member  status 
is  continually  updated  depending  on  the  message  received.  If  no  message  is  received 
then  its  status  is  denoted  as  message  absent,  and  if  it  is  different,  it  is  designated 
as  failed.  This  is  because  the  processor  might  have  a  receive-failure  and  might  not 
have  received  the  messages.  Based  on  these  updates,  the  new  group  membership  is 
determined.  All  processors  need  to  check  for  a  majority  of  operational  processors  in 
a  cycle.  If  more  than  the  majority  fail,  then  the  status  of  the  processor  is  set  to  failed 
and  it  stops  execution. 

This  algorithm  takes  action  in  a  distributed  manner.  Its  drawback  is  that 
it  requires  all  processors  to  have  a  priori  knowledge  of  the  sequence  of  medium  access 
for  all  processors. 

C.     ASYNCHRONOUS  SYSTEMS 

In  an  asynchronous  distributed  system,  there  is  no  upper  bound  on  the  mes- 
sage delivery  time.  There  is  no  synchronization  of  clocks,  and  hence,  the  concept  of 
global  time  is  not  there.  Since  the  delivery  delay  is  unbounded,  it  is  impossible  to 
distinguish  between  failures  and  communication  delays.  Therfore,  in  an  asynchronous 
system,  processors  are  only  perceived  to  have  failed.  Because  of  these  constraints,  the 
algorithms  require  multiple  message  rounds  for  committing  a  change.  The  following 
paragraphs  deal  with  approaches  by  Chang,  Birman,  Bruso,  and  Smith. 

1.      Failure  Detection  and  Notification  Protocol 

This  protocol,  developed  by  Bruso  [Ref.  Bru85],  is  aimed  at  distributed 
database  systems  with  a  token  ring  network.  It  detects  the  failure  of  nodes  and  notifies 
all  nodes  when  a  recovery  is  complete.   It  is  designed  for  crash  failures  and  commu- 
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nication  isolation.  In  this  protocol,  an  acknowledgement  is  required  for  all  messages. 
The  protocol  is  divided  into  failure  detection  and  recovery  reporting  segments 

The  failure  detection  is  decentralized  and  is  achieved  in  the  following  way. 
If  a  processor  does  not  receive  an  acknowledgement  for  a  message  after  many  re- 
transmissions, it  initiates  the  failure  detection  by  sending  node  down  messages  to 
all  other  processors.  All  processors  receiving  this  message  retransmit  it  to  all  other 
processors.  The  approach  is  robust  with  respect  to  multiple  failures.  This  leads  to  a 
flood  of  messages  where  each  processor  sends  to  every  other  processor.  Each  of  these 
messages  sent  is  acknowledged  and  the  replies  are  used  for  determining  which  nodes 
are  reachable.  If  an  acknowledgement  is  not  received,  then  the  failure  detection  is 
started  for  the  node  for  which  acknowledgement  was  not  received.  All  processors,  on 
receiving  a  node  down  message,  do  not  attempt  to  verify  it  and  change  the  status  to 
down.  In  this  way,  the  integrity  of  the  member  status  at  all  nodes  is  preserved. 

The  recovery  reporting  is  a  centralized  function.  When  a  processor  recov- 
ers, it  sends  a  node  up  message  to  all  processors.  Processors,  on  receiving  the  message, 
will  note  the  status  as  up  and  acknowledge  to  the  recovered  process.  The  recovered 
process  updates  its  local  status  based  on  the  acknowledgement.  It  also  verifies  that 
all  processors  that  it  can  reach  also  agree  about  the  status  of  all  down  nodes.  This 
protocol  uses  a  version  number  for  reaching  agreement.  A  version  number  denotes 
the  number  of  times  a  node  has  failed.  This  is  done  by  initiating  failure  detection 
protocol  for  the  members  from  which  acknowledgement  was  not  received.  This  causes 
all  other  processors  to  follow  suit.  The  greatest  merit  of  this  system  is  that  it  is  simple 
and  robust  for  multiple  failures.  Its  problem  is  that  the  number  of  messages  does  not 
scale  linearly  as  the  number  of  members  increases. 


2.      Protocol  Based  on  Total  Message  Ordering 

This  protocol  by  Moser,  et.al.,  is  built  on  top  of  protocols  guaranteeing 
reliable  and  totally  ordered  message  broadcasts[Ref.  LSA91].  This  protocol  is  non- 
blocking  and  tolerates  partitions.  It  assumes  an  underlying  fault  tolerant  ordering 
protocol  and  no  broadcasts  are  delayed  when  there  is  a  membership  change. 

In  this  protocol,  all  messages  are  associated  with  ordinal  numbers  denoting 
their  position,  since  total  order  common  to  all  processors  is  assumed.  There  is  an 
agreement  process  running  on  the  ordering  of  messages.  If  a  processor  does  not  order  a 
message  for  a  long  time  it  is  deemed  to  have  failed  and  a  failure  notification  message  is 
sent  by  the  processor  identifying  failure.  Since  all  messages  are  ordered,  this  message 
is  sent  in  the  same  relative  order  to  all  other  members  and  the  processor  is  removed 
from  the  group. 

When  a  processor  wants  to  join  the  group,  it  sends  a  special  message  called 
a  request  message.  This  message,  when  received  by  another  processor,  is  ordered  with 
an  ordinal  number.  It  sends  an  acknowledgement  message  to  the  requesting  processor 
with  the  ordinal  number  of  the  most  recent  message  it  has  ordered.  The  processor 
now  orders  all  the  messages  it  receives  and,  when  it  receives  the  ordered  message, 
determines  that  it  has  been  admitted  to  the  group  and  starts  sending  messages. 
These  protocols  only  give  incremental  changes  to  the  configuration.  For  getting  the 
complete  view,  it  is  possible  to  have  initialization  algorithms  which  initialize  the  view 
each  time  a  processor  joins  the  configuration. 

This  protocol  has  a  low  communication  overhead  for  membership  changes. 
However,  it  assumes  a  total  order  on  messages  which  involves  multiple  message  rounds 
for  agreement. 
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3.      Protocol  Based  on  Rotating  Token  List 

This  protocol  by  Maxemchuk  and  Chang  is  developed  as  a  result  of  devel- 
oping reliable  broadcast  protocols  using  a  rotating  token  site  [Ref.  CM84].  In  this 
protocol,  a  reformation  of  the  token  list  occurs  whenever  there  is  a  failure  or  recovery. 
This  is  a  three  phase  protocol.  The  protocol  assumes  a  datagram  service  and  assumes 
fail-stop  behavior.  There  is  a  token  list  of  all  members  in  the  group.  A  site  which 
detects  a  failure  or  recovery  is  the  originator  of  the  protocol.  It  invites  other  sites  in 
the  group  to  form  a  new  list.  To  prevent  multiple  lists  being  generated  a  site  can  join 
only  one  list,  and  the  list  containing  the  majority  is  taken  to  be  the  valid  list. 

All  lists  have  a  version  number  attached  to  them.  A  site  can  join  only  lists 
with  a  higher  version  number.  These  two  rules  are  used  to  generate  only  one  valid  list. 
The  originator  receives  responses  from  the  other  members  with  the  timestamp  of  the 
next  message  they  are  expecting  and  the  version  number  of  the  list.  If  the  response 
is  from  a  majority  of  the  members  in  the  old  list,  it  creates  a  new  list  consisting  of 
members  who  have  responded  and  passes  it  to  all  the  members  in  the  new  list.  If 
it  does  not  have  a  majority,  it  aborts  the  reformation  phase.  In  phase  3,  if  all  the 
members  in  the  token  list  agree,  the  new  token  list  is  committed  and  the  token  is 
passed  to  the  new  token  site.  The  token  site  is  selected  based  on  the  timestamp  of 
the  message  and  the  member  with  the  largest  time  stamp  is  elected  to  be  the  token 
site  because  it  has  received  the  most  number  of  messages.  Timeouts  are  incorparated 
in  the  protocol  such  that  there  is  no  eternal  wait  for  responses. 

This  protocol  is  a  blocking  protocol  and  is  likely  to  be  unmanagable  if  there 
are  frequent  changes  to  the  group.  When  the  token  site  fails,  the  reformation  protocol 
is  more  complicated  and  requires  another  round  of  communication  to  recover  any  lost 
mesages. 
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4.      ISIS  Approach 

This  approach  is  a  nonblocking  approach  developed  by  Birman,  et.al.,  [Ref. 
RB91].  It  assumes  that  processes  communicate  through  a  completely  connected  net- 
work of  reliable  FIFO  channels.  There  is  no  bound  on  message  delivery  times  and 
there  is  no  global  clock.  It  does  not  assume  any  underlying  fault  tolerant  communi- 
cation. The  processes  only  fail  by  crashing  and  all  recovering  processes  are  joined  as 
new  processes. 

This  protocol  is  a  centralized  approach.  There  is  a  process  designated  as 
Mgr  (for  manager)  which  is  responsible  for  coordinating  updates  to  the  local  views 
of  other  processes.  When  a  process  finds  another  process  faulty,  it  informs  the  Mgr. 
Mgr  then  initiates  a  two  phase  protocol  to  commit  the  failure.  It  sends  a  message 
informing  the  failure  of  a  member  to  all  the  other  members  of  the  group  and  awaits 
their  response.  At  the  end  of  this  phase,  all  operational  members  agree  on  the  failure 
of  the  member.  In  phase  2,  Mgr  broadcasts  a  commit  message  that  informs  all 
members  to  remove  the  member  from  their  groupview. 

If  the  Mgr  fails  in  the  middle  of  a  commit  phase,  no  system  view  will  ex- 
ist. To  reestablish  the  view,  the  reconfiguration  algorithm  deals  with  progression 
and  succession  problems.  This  is  a  three  phase  algorithm.  The  initiator  broadcasts 
the  reconfiguration  interrogation  message  to  all  the  members  in  its  local  view.  The 
initiator  is  a  member  who  has  been  the  member  of  the  group  for  the  most  number  of 
changes.  The  initiator  broadcasts  a  reconfiguration  interrogation  message  to  all  pro- 
cessors in  its  local  view  and  awaits  its  response.  Based  on  the  majority  response,  the 
initiator  determines  an  update  event,  based  on  the  local  states  of  the  other  processes. 
The  execution  of  this  event  restores  the  system  view.  The  initiator  broadcasts  this  as 
the  reconfiguration  proposal  message.  It  then  sends  the  commit  message  after  receiv- 
ing a  majority  response.    Election  of  a  new  manager  must  avoid  invisible  commits. 
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The  protocol  tolerates  only  minority  of  failures  in  successive  views  and  is  one  of  its 
problems. 
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III.  GROUP  MEMBERSHIP  PROTOCOL 

In  the  previous  chapter,  several  approaches  to  GMP  were  discussed  and  their 
merits  and  problems  were  highlighted.  A  decentralized  Group  Membership  Protocol 
has  been  proposed  by  Shukla  and  Devalla  in  [Ref.  ShDr].  In  this  thesis  we  further 
elaborate  on  the  protocol  and  describe  an  implementation.  This  chapter  describes 
the  basic  functions  and  algorithm  of  the  protocol,  and  therfore  ,  contains  material 
that  has  been  directly  reproduced  from  [Ref.  ShDr]. 

The  basic  functions  required  of  group  membership  protocols  are  to  detect  changes 
in  the  membership  and  ensure  that  all  operational  members  commit  these  changes  to 
their  local  views  consistently.  The  consistent  commit  requirement  entails  an  agree- 
ment about  the  change  detected.  Given  such  a  protocol,  higher  level  tools  for  con- 
structing distributed  applications,  such  as  ISIS  [Ref.  BSS91],  can  be  constructed. 
Solution  of  the  GMP  is  complicated  by  the  following  two  properties  of  asynchronous 
distributed  systems.  Firstly,  since  it  is  impossible  to  distinguish  a  failed  process  from 
a  slow  process,  failure  detection  is  not  possible.  Any  failure  is  only  a  perceived  failure 
that  everyone  must  commit  to  eventually.  Secondly,  unless  the  underlying  network 
communication  is  embellished  in  some  manner,  such  as  total  ordering  of  messages 
[Ref.  LSA91]  or  total  ordering  of  access  to  the  communication  medium  [Ref.  EzLe90], 
the  consistency  must  be  achieved  using  only  a  network  of  reliable,  first-in- first- out 
(FIFO)  communication  channels,  the  delay  over  which  is  unbounded.  This  implies 
that  agreement  and  consistent  commits  can  only  be  achieved  by  multiple  message 
rounds. 

In  this  approach,  as  in  [Ref.  BSS91],  it  is  assumed  that  all  communication  be- 
tween members  of  a  group  carries  a  view  number.   The  GMP  is  solved  by  requiring 
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that  each  increment  of  the  view  number  be  associated  with  successive  views  that  differ 
in  only  one  member  and  guaranteeing  that  a  given  view  number,  at  any  operational 
member,  has  the  same  membership.  The  protocol  proposed  herein  uses  a  completely 
connected  network  of  reliable  FIFO  channels  and  incorporates  continuous  changes  to 
the  group  membership  without  the  need  for  a  priori  knowledge  of  potential  members. 
This  approach  eliminates  the  need  for  centralizing  the  responsibility  of  ensuring  con- 
sistency of  view  changes  as  in  [Ref.  RB91]  by  maintaining  the  group  view  ordered  as 
a  logical  ring  at  each  member.  Each  member  perceives  the  departure  of  a  neighboring 
member  and  joining  members  enter  on  one  side  of  a  virtual  marker  whose  position  is 
maintained  by  all  the  members.  Agreement  and  commit  actions  are  achieved  using 
tokens  circulated  along  the  logical  ring.  The  protocol  is  able  to  regenerate  lost  tokens 
and  ignore  duplicate  tokens  generated  during  its  operation. 

A.     ASSUMPTIONS,  OVERVIEW,  AND  DEFINITIONS 

Our  objective  is  to  develop  a  group  membership  algorithm  that  can  be  used 
as  the  basis  of  fault-tolerant  process  group-based  communication  primitives  such  as 
those  described  in  [Ref.  B+90,  BJ87,  BSS91].  As  mentioned  previously,  it  is  assumed 
that  every  membership  view  at  a  member  is  assigned  a  view  number  and  views  cor- 
responding to  successive  numbers  differ  by  exactly  one  change  (either  deletion  or 
addition  of  a  member).  Reliable  FIFO  communication  channels  between  any  two 
processors  that  are  operational  is  assumed.  All  failures  are  assumed  to  be  crash  or 
fail-stop  [Ref.  Cri88].  This  implies  that  a  message  sent  will  not  be  delivered  only 
because  of  the  receiver's  failure.  However,  it  may  be  arbitrarily  delayed.  Continuous 
changes  to  the  membership  are  allowed;  however,  the  changes  are  committed  one  at 
a  time.  A  member  gets  added  to  the  group  when  a  join  request  is  processed  and  gets 
deleted  from  the  group  when  a  departure  is  perceived.  It  is  assumed  that  the  group 
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name  is  public  to  those  processes  that  may  wish  to  join  the  group.  Some  mechanism  is 
assumed  to  exist  whereby  the  process  wishing  to  join  can  query  the  operating  system 
at  a  site  if  it  is  running  a  member  process  of  the  known  group  name. 
1.      Overview 

Group  Membership  Protocol  (MP)  guarantees  that  the  changes  to  the 
group  view  and  their  sequence  at  each  operational  member  are  identical.  Using  a 
view  number  in  all  group-related  communication  guarantees  that  fault-tolerant  group 
communication  can  be  achieved.  The  principle  feature  of  the  MP  is  that  there  is  no 
central  element  either  to  detect  a  member's  change  in  membership  status  or  to  guar- 
antee consistency  of  a  commit  action  on  the  group  membership.  Both  are  achieved 
in  a  distributed  manner  using  a  logical  ring  which  is  simply  a  conceptual  circular 
ordering  of  the  members.  It  has  no  relation  with  the  physical  locations  of  the  mem- 
bers. Given  such  a  ring  and  a  direction  of  traversing  it  (  clockwise  is  selected  for  no 
particular  reason),  each  member  periodically  queries  its  counter-clockwise  neighbor 
for  its  status.  The  neighbor  then  responds  with  a  status  message  when  it  receives  this 
query.  It,  in  its  turn,  sends  a  status  query  to  its  counter-clockwise  neighbor.  Thus, 
every  member  monitors  one  other  member  and  is  itself  monitored  by  a  third  member. 
For  example,  if  there  are  6  members  p<>  to  ps,  a  logical  ring  can  be  configured  in 
which  p0  is  an  counter-clockwise  neighbor  of  p\  and  clockwise  neighbor  of  p5,  p\  is  an 
counter-clockwise  neighbor  of  P2  and  clockwise  neighbor  of  p0,  and  so  on.  p\  sends  a 
status  query  to  po  and  po  responds  with  a  status  message  to  pi.  The  status  message 
from  po  is  monitored  by  pi.  This  is  illustrated  in  Fig.  3.1.  Every  member  periodi- 
cally sends  a  status  query  and  receives  a  message  that  indicates  that  the  monitored 
member  continues  to  be  a  group  member.  Initially,  the  ring  configuration  is  known 
to  all  the  members.  As  members  change  status,  either  voluntarily  or  involuntarily, 
the  ring  configuration  changes.  The  protocol  maintains  sufficient  information  at  each 
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Figure  3.1:  A  Logical  ring 

operational  member  to  enable  it  to  determine  the  processor  it  must  monitor. 

The  MP  treats  the  cases  of  a  member  leaving  the  group  in  the  same  manner 
as  a  member  joining  the  group.  l  When  a  member  departs  voluntarily,  it  simply 
stops  responding  to  the  status  query  from  its  monitor.  Upon  failure,  it  is  unable  to 
respond  to  its  monitor.  In  either  case,  if  a  monitor  does  not  receive  a  status  message 
within  a  certain  time  interval  after  its  query,  the  monitored  member  is  perceived  to 
have  left  the  group  and  the  algorithms  to  ensure  that  all  the  operational  members 
consistently  commit  to  this  change  are  invoked.  When  a  member  recovers  or  wishes 
to  join  anew,  it  sends  a  join  request  to  the  first  group  member  it  can  locate.  The 
member  wishing  to  join  interrogates  all  the  member  sites  which  could  have  the  group 
running  at  its  site.  It  is  assumed  that  the  operating  system  at  each  member  site  has 
a  knowledge  of  all  the  groups  that  the  processes  executing  in  it  are  members  of.  The 


1  Failures  amount  to  a  member  leaving  involuntarily  and  recoveries  amount  to  a  member  joining 
as  a  new  one. 
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MP  guarantees  that  only  one  of  the  operational  members  of  the  group  processes  the 
join  request.  There  are  two  phases  in  the  protocol  to  process  a  join  or  a  departure, 
viz.,  the  agreement  phase  and  the  commit  phase.  These  phases  are  token-based  and 
guarantee  that  no  tokens  are  lost  due  to  departures.  They  also  guarantee  that  the 
protocols  are  robust  with  respect  to  generation  and  processing  of  duplicate  tokens. 
a.      Processing  of  Individual  Changes 

Simple  cases  of  individual  changes  are  first  illustrated  to  orient  the 
reader  and  a  more  detailed  description  is  given  in  the  next  section. 

A  single  departure  is  processed  as  follows.  Once  a  member  perceives 
the  departure  of  its  monitored  member  because  it  does  not  receive  a  status  message 
in  response  to  its  query  for  a  predetermined  time  interval,  it  initiates  the  agreement 
phase  by  sending  an  agreement  token  to  its  clockwise  neighbor.  It  also  starts  moni- 
toring the  counter-clockwise  neighbor  of  the  member  perceived  to  have  departed.  The 
agreement  token  is  passed  around  the  ring  in  the  clockwise  direction  by  each  member 
passing  it  on  to  its  clockwise  neighbor.  When  this  token  circulates  back  to  the  agree- 
ment initiator,  it  has  gone  completely  around  the  ring  once  and  all  the  operational 
members  have  information  indicating  that  the  group  has  reached  an  agreement  on 
the  failure  perceived.  The  agreement  initiator  then  starts  the  commit  phase  by  gen- 
erating a  commit  token  which  is  circulated  around  the  ring  in  the  same  manner  as 
in  the  agreement  phase.  All  the  members  receiving  this  token  commit  the  change  by 
removing  the  departed  member  from  their  group  view  and  updating  the  view  number. 

A  join  is  processed  as  follows.  The  protocol  maintains  a  logical  marker 
in  the  ring  as  the  position  between  some  pair  of  adjacent  operational  members  at 
initialization.  The  clockwise  member  of  this  pair  is  designated  as  the  host  of  the 
logical  ring  and  is  known  to  all  members  initially.  As  shown  in  Fig.  3.1,  a  new  member 
always  enters  as  the  counter-clockwise  neighbor  of  the  host  who  has  the  responsibility 
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of  carrying  out  the  agreement  and  commit  phases  for  the  new  member.  It  should  be 
noted  that  identifying  a  member  as  failed  merely  determines  who  will  initiate  the 
agreement  phase  for  a  join  request  and  does  not  represent  the  centralization  of  any 
function.  Although  the  designation  as  a  host  may  move  from  one  member  to  another 
in  the  clockwise  direction  due  the  departure  of  the  host  member  itself,  the  protocol 
enables  all  the  operational  members  to  maintain  knowledge  of  the  current  host  of 
the  ring.  It  makes  the  incoming  member  its  monitored  neighbor  and  delivers  local 
membership  view,  view  number,  and  other  related  information  to  it. 
2.      Definitions 

Certain  items  of  information  are  maintained  locally  at  each  member  to 
ensure  the  correctness  of  the  protocol.  All  members  maintain  a  set  corresponding 
to  the  current  group  view  containing  all  the  operational  members.  In  addition,  they 
maintain  a  status  table  locally  which  stores  the  perceived  state  of  all  the  members  that 
are  in  the  process  of  departing  or  joining.  This  table  is  used  by  a  member  to  reject 
any  duplicate  tokens  generated  due  to  the  departure  of  a  member  in  the  ring  in  the 
middle  of  any  phase.  There  is  a  pool  of  all  the  tokens  received  by  a  member  wherein 
all  the  tokens  transferred  to  the  neighbor  are  stored  until  removed  by  the  update 
policy  described  later.  This  pool  is  maintained  in  the  order  of  receipt  and  is  managed 
so  that  no  token  is  lost  upon  the  failure  of  a  member.  Using  the  current  group  view 
and  the  status  table,  each  member  determines  the  member  it  must  monitor. 
a.      Group  Membership  Problem  Definition 

Every  member,  pi,  associates  an  integer,  vn,  with  its  current  group 
view,  denoted  by  the  set  GVvn(pi),  and  increments  it  by  one  for  every  view  change 
committed.  Solution  of  the  group  membership  problem  requires  that 

VPi    G    GVvn{Vi)and\/n<vn,GVn(Pj)  =  GVn{Pi) 
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Therefore,  a  group  membership  protocol  is  correct  if  it  guarantees  the  above.  In  the 
following,  unless  necessitated  by  the  context,  the  view  number  will  be  dropped  as  a 
subscript. 

b.      Logical  Ring 

Assume  a  set  of  members,  GV  —  {p0,pi,p2,  •  •  •  , Pn-i},  forming  a  log- 
ical ring.  A  logical  ring  is  simply  a  circular  sequence  of  these  members  regardless  of 
their  physical  interconnection.  Members  along  the  ring  can  be  visited  by  traversing 
it  either  clockwise  or  counter-clockwise.  Given  such  a  ring,  a  direction  of  traversing 
it,  and  a  member,  say  pj,  we  define  the  following  relation  by  visiting  each  remain- 
ing member  once  along  the  ring,  in  order,  and  returning  to  p;  from  the  last  member 
visited.  Using  this  ordering  of  members,  the  following  relation  can  be  defined. 

Ring  Relation  (RR):  Given  two  members,  Pj,pk  £  GV,  pj  -4  pk  (read 
as  pj  is  followed  by  pk  with  respect  to  pi )  if  pk  is  visited  after  pj  when  starting  from 

Pi- 

Clearly,  given  a  ring  and  a  direction  of  traversal,  such  a  relation  can 
be  defined  with  respect  to  every  member  in  GV .  On  the  other  hand,  given  the  above 
ring  relation  for  any  pi,  the  logical  ring  has  the  following  ring  property. 

V  p.,  Pj,Pk  €  GV  if  pj  -^  pk,  then  pk  -4  Pi  and  p<  "  Pj 

Every  member  orders  its  own  group  view  as  a  logical  ring  with  the  above  property.  For 
a  logical  ring,  we  define  a  logical  marker  along  the  ring  that  does  not  move.  However, 
its  adjacent  members  may  change  due  to  departures  and  joins.  Every  member  p» 
keeps  track  of  the  position  of  the  logical  marker  by  ordering  GV(pi)  as  a  logical  ring 
with  respect  to  phoat  where  phost  is  the  first  operational  member  clockwise  from  the 
logical  marker.  Every  pj  (E  GV(pi)  has  a  rank,  rankPi(pj),  defined  as  the  number  of 
members  between  phost  and  itself  with  rankpi(phoSt)  =  0.  Every  p;  maintains  pmon  as 
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the  last  member  to  query  it  for  its  health. 
c.      Tokens 

The  proposed  protocol  is  based  on  token  circulation  to  achieve  agree- 
ment and  consistent  commit  actions  among  members.  The  agreement  token  initiated 
at  pi  for  pj  perceived  to  have  departed  or  joined  is  denoted  as  agreePi(pj).  Similarly, 
the  commit  token  initiated  at  pi  for  pj  perceived  to  have  departed  or  joined  is  denoted 
as  commitPi(pj).  When  there  is  a  potential  member  wishing  to  join  the  group  and 
the  request  is  received  by  a  member  of  group  other  than  the  host,  the  member  who 
receives  it  sends  a  join  request  token  denoted  by  joinreqPi(pj).  It  should  be  noted 
that  the  initiators  of  the  agreement  and  commit  tokens  for  a  given  change  need  not  be 
identical  and  also  need  not  be  the  same  as  the  members  that  perceived  the  changes  in 
the  first  place.  It  is  possible  that  p2  might  perceive  the  failure  of  its  neighbor  p\  and 
before  initiating  the  agreement  phase  might  itself  fail.  Then  its  neighbor  pz  would 
first  initiate  agreement  processing  for  the  p2  and  then  initiate  agreement  for  p\.  If  pz 
fails  before  the  agreement  phase  is  complete  then  its  neighbor  p4  would  commit  the 
failure  of  p\,  p2  and  p3. 

Every  token  carries  information  about  whether  it  is  for  a  departure 
or  join.  Every  member  pi  maintains  a  local  status  table,  denoted  as  STPi.  A  member 
has  an  entry  in  this  table  at  pi  only  if  it  has  been  perceived  to  have  departed  but 
not  yet  committed  out  of  GV(pi)  or  if  it  is  perceived  to  have  joined  but  is  not  yet 
committed  into  GV(pi).  This  is  an  important  property,  since  the  correctness  of  the 
protocol  depends  upon  it.  The  five  possible  entries  of  STPi(pj)  are:  Departure  Agreed, 
JoinAgreed,  DepartureP ending,  JoinRequest,  and  JoinPending. 

Departure  Agreed  entry  signifies  that  the  agreement  token  for  the  mem- 
ber to  leave  has  been  initiated  and  it  is  yet  to  be  committed  and  removed  from 
groupview.    JoinAgreed  entry  is  same  as  Departure  Agreed  except  that  it  is  for  join- 
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ing  a  group  instead  of  leaving  the  group.  DepartureF 'ending  and  JoinPending  entries 
signify  that  agreement  phase  is  completed  but  there  are  other  changes  to  be  com- 
mitted before  committing  this  change  to  maintain  consistency  at  all  member  sites  of 
the  order  of  committing  the  member.  JoinRequested  entry  signifies  that  a  potential 
member  has  sent  a  request  for  joining  the  group  and  that  this  member  has  passed 
on  the  information  to  its  neighbor  on  the  way  to  the  host.  Every  member  pi  main- 
tains a  pool  of  all  the  tokens  it  receives,  denoted  as  TknPool(pi),  in  the  order  they 
are  received.  Tokens  from  this  pool  are  deleted  carefully  because  the  receiver  of  a 
token  may  depart  before  receiving  it  or  immediately  after  receiving  it  and  the  token 
is  likely  to  get  lost.  The  principle  followed  in  token  deletion  is  to  retain  a  token  at  a 
member  until  it  is  guaranteed  that  its  use  is  complete.  The  TknPool  update  policy  is 
as  described  in  the  next  section. 

d.      Neighbor  and  Host  Computation 

The  following  rules  determine Phostipi),  the  clockwise  neighbor  cwnbr(pi), 
and  the  counter-clockwise  neighbor  acwnbr(pi)  using  the  ring  relation  on  GV(pi)  and 
the  status  table  STPi. 

•   Rule  to  determine  a  new  phoat-  At  p,-,  phoat  =  Pj  £  GV(pi)  such  that  V  Pk{^ 
Pj)  €  GV(Pi), 

Pj  "^  Pk  where  p0u  is  the  old  host.  This  rule  assigns  the  operational  clockwise 
neighbor  of  p0a  as  the  new  phoat  and  is  invoked  to  compute  the  new  host  every 
time  a  member  commits  the  departure  of  its  phoit-  It  should  be  noted  that 
selection  of  the  new  host  is  determined  only  by  the  current  GV(pi)  and  not 
along  with  STPi.  Since  all  the  group  views  are  consistent,  this  ensures  that  all 
the  members  arrive  at  the  same  phoat- 
Time  of  application:  This  rule  is  applied  whenever  there  is  a  removal  of  a  mem- 
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ber  committed. 
•  Rule  to  determine  cwnbr(pi):   The  clockwise  neighbor  is  always  the  member 


'mon  ■ 


from  whom  the  status  query  is  received.  cwnbr(pi)  =  pTi 

Time  of  application:  This  rule  is  is  applied  whenever  status  query  comes  from 

a  member  other  than  the  current  cwnbr. 

•  Rule  to  determine  acwnbr{ pi):  acwnbr[pi)  =  pj   £   GV(pi)  such  that  V  Pkii1 

Pi)  e  GV(pi) 

pk  ^  pj  and  pj  0  STPi. 

Exception:  If  pj  =  phOBtand3  a  pj  such  that  STPi(pj)  changes  from  JoinAgreed 

to  JoinPending  or  gets  committed,  acwnbr{pi)  =  pj.   Upon  a  join,  this  ensures 

that  phost  determines  the  correct  member  to  monitor. 

Time  of  application:  This  rule  is  applied  whenever  a  timeout  on  the  arrival  of 

status  report  from  the  current  acwnbr  and  when  there  is  a  removal  or  join  being 

committed. 

B.     THE  MEMBERSHIP  PROTOCOL 

For  a  departure,  the  MP  at  a  member  is  activated  either  by  non-receipt  of 
the  status  response  from  its  cwnbr,  the  monitored  member,  or  by  the  receipt  of  a 
departure  agreement  token  from  its  acwnbr.  In  case  of  a  join,  it  is  activated  if  it 
is  the  phost  and  receives  a  JoinReceived  token  from  its  acwnbr.  JoinReceived  token 
processing  is  described  in  a  greater  detail  in  chapter  IV  . 

We  shall  first  describe  the  change  detection  instruments  of  this  protocol.  We 
follow  this  with  description  of  the  agreement  and  commit  algorithms  executed  at  any 
member.  It  should  be  remembered  that  the  membership  view  at  pi  is  arranged  as 
a  logical  ring,  and  therefore,  the  ring  relation  is  defined  on  it.   Also,  every  member 
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places  a  logical  marker  on  its  own  logical  ring. 

1.      Status  Change  Detection  and  Agreement  Initiation 

Figure  3.2  shows  the  protocol  each  member  executes  to  monitor  its  counter- 
clockwise neighbor  and  initiate  an  agreement  token  if  a  departure  is  detected.  The 
Monitor  process  is  triggered  by  the  local  clock.  The  clockwise  and  counter-clockwise 
neighbors  are  computed  according  to  the  rules  given  earlier  in  every  iteration  of  the 
while  loop.  If  a  status  message  is  not  received,  it  shuts  off  communication  with 
the  member  perceived  to  have  departed  (to  prevent  receipt  of  an  excessively  delayed 
response),  updates  the  local  status  table,  generates  and  adds  it  to  the  local  pool  of 
tokens,  and  sends  the  agreement  token  to  its  clockwise  neighbor. 

Note  that  only  an  operational  member  that  does  not  have  an  entry  in  the 
status  table  is  determined  to  be  the  cwnbr  by  the  rules. 

If  this  member  turns  out  to  have  already  departed,  the  status  reporting 
instrument  shown  in  Fig.  3.3  ensures  that  the  token  will  get  sent  to  the  next  clockwise 
operational  member.  When  a  change  in  the  querying  member  is  detected,  the  TknPool 
gets  sent  to  the  new  querying  member  in  addition  to  the  status  response.  It  recognizes 
a  change  in  the  querying  member  by  inspecting  pmon  to  send  its  TknPool.  It  should 
be  noted  that  Report  St  at  us  does  not  compute  the  clockwise  neighbor,  but  simply 
responds  to  the  sender  of  the  query. 

Similarly,  when  a  member  receives  a  JoinRequest,  it  executes  a  protocol 
as  specified  in  Fig.  3.4.  A  non-member  wishing  to  join  a  group  finds  the  nearest 
site  running  a  process  that  is  a  member  of  the  group  it  wants  to  join.  It  sends  a 
join  request  to  this  member  and  waits  for  an  intimation  of  the  request  approval  for 
a  preset  interval  before  resending  the  request.  Duplicate  requests  are  handled  as 
described  below.  The  member  receiving  the  request  does  the  following: 
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Monitor  process  at  pi 

1  while  (true) 

2  send  status  query  to  acwnbr[pi)\ 

3  wait  for  Tpad\  /*local  timeout  interval*/ 

4 
5 

if  (status  message  not  received) 

shut  off  communication  with  acwnbr(pi); 

6 
7 
8 
9 
10 

STPi(acwnbr(pi))  <—  Departure  Agreed] 
generate  agreePi(acwnbr(pi)); 
add  agreePi(pj)  to  TknPool; 
send  agreePi(acwnbr(pi))  to  cwnbr(pi); 
else 

11 
12 

Wait   IOr   1  query  period': 

end  if; 

13 

end  while; 

end  Monitor. 

Figure  3.2:   Algorithm  for  monitoring  and  agreement  initiation 

•  If  the  request  is  not  a  duplicate,  it  generates  JoinReceived  token  with  the 
requester's  address  in  it.  If  the  request  is  a  duplicate,  the  member  ignores  it. 

•  Enters  this  token  in  its  TknPool,  makes  an  entry  in  ST  and  sends  it  to  its  cwnbr. 

2.      The  Agreement  Phase 

The  algorithm  used  to  process  an  agreement  token  is  shown  in  Fig.    3.5. 
If  the  member  that  receives  an  agreement  token  for  the  first  time  is  not  its  initiator 


ReportStatus  process  at  pi 

1  if  (querying  member  ^  pmon) 

2  send  TknPool  to  the  querying  member; 

3  Pmon  —  querying  member; 

4  end  if; 

5  send  status  to  pmon; 
end  ReportStatus. 


Figure  3.3:  Algorithm  for  reporting  the  status 
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InitiateJoin  for  a  request  from  pnew 

at  pi 

1 

while  (true) 

2 

read  Tknpool  for  Jo  in  Requests; 

3 

if  (Phost  =  Pi) 

4 

generate  agreePi(pnew)\ 

5 

STPi{pnew)  <—  Join  Agreed; 

6 

add  agreePi(pnew)  to  TknPool; 

7 

send  agreePi(pnew)  to  cwnbr(pi)] 

8 

else  send  joinreqPi(pnew)  to  cwnbr. 

9 

end  if; 

10 

end  while: 

end  InitiateJoin. 

Figure  3.4:   Algorithm  to  initiate  a  join 

then  it  must  simply  pass  it  on  to  its  clockwise  neighbor  after  adding  it  to  its  token 
pool  and  updating  the  local  status  table  (lines  15-19  of  Fig.  3.5).  However,  if  it  is 
the  initiator  of  the  token,  it  must  generate  a  commit  token.  It  must  also  generate 
a  commit  token,  if  a  member  receives  a  duplicate  agreement  token  with  an  initiator 
that  has  an  entry  in  its  status  table  denoting  the  failure  of  the  initiator. (line  1,  Fig. 
3.5). 

The  member  commits  a  change  to  its  view  when  it  sees  a  commit  token 
for  it.  Therefore,  the  initiator  of  a  commit  token  must  commit  the  change  locally  in 
addition  to  generating  and  sending  it.  There  are  two  aspects  to  committing  a  change 
in  the  group  view  in  this  protocol.  Firstly,  since  the  ring  configuration  may  lead 
to  two  commit  tokens  arriving  at  two  different  members  in  the  opposite  order,  the 
changes  must  be  committed  in  a  consistent  order  at  all  the  members.  Secondly,  when 
a  change  is  committed,  it  must  be  ensured  that  all  the  protocol-related  entities  are 
correctly  updated.  All  the  effects  of  committing  a  change  as  CommitChange  whose 
steps  are  shown  in  Fig.  3.6. 
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3.      The  Commit  Phase 

The  processing  of  a  commit  token  as  it  circulates  around  the  ring  is  shown 
in  Fig.  3.7.  If  the  receiver  is  the  commit  initiator  (token  circulates  back  to  its  initiator) 
or  if  the  commit  token  is  received  again,  it  simply  exits.  This  indicates  completion 
of  the  protocol  for  that  particular  change.  If  it  is  received  for  the  first  time  at  a 
member,  appropriate  commit  action  must  take  place  (line  4).  After  committing  the 
change  specified  in  this  token,  it  is  likely  that  a  change  for  which  a  commit  token 
generation  was  kept  pending  locally,  can  now  be  committed  and  propagated  because 
it  now  has  the  lowest  rank.  All  such  pending  changes  can  now  be  processed  (lines  5 
-  7).  We  now  discuss  the  actions  required  for  committing  a  change  (CommitChange). 

a.      Effects  of  a  Commit  Action 

All  the  effects  of  a  commit  action  are  shown  in  Fig.  3.6  as  CommitChange 
for  commitPj(pk)  received  at  pi.  The  straightforward  effects  are  deletion  of  pk  from 
the  group  view  at  p^,  update  of  p,-'s  local  status  table,  its  view  number  increment, 
and  passing  the  token  on  to  p^s  clockwise  neighbor.  There  are  three  other  important 
effects  that  must  take  place  when  a  commit  token  is  generated.  First,  it  must  deter- 
mine a  new  host  (line  8),  Phoat  for  the  ring  according  to  the  rule  given  at  the  end  of 
section  2.  Second,  it  must  take  appropriate  action  if  the  change  committed  is  a  join 
(lines  9  -  11).  The  additional  function  to  be  performed  when  committing  a  join  is  to 
send  the  current  group  view,  view  number,  local  status  table,  and  the  token  pool  to 
the  joining  member.  This  is  essential  to  ensure  that  the  new  member  has  up-to-date, 
consistent  information  about  the  group  at  the  time  of  joining.  Receiving  it  from  phost, 
which  is  clockwise  from  itself,  guarantees  that  the  new  member  behaves  consistently 
with  the  host.  Finally,  committing  a  change  locally  presents  an  opportunity  to  cor- 
rectly update  the  local  TknPool  (lines  5-7).  The  principle  followed  in  this  update 
is  that  a  token  should  be  deleted  from  the  TknPool  only  when  the  member  is  certain 
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that  its  use  is  over.  It  allows  inspection  of  all  the  tokens  in  it  and  keeps  them  ordered 
according  to  their  arrival.  As  specified  by  the  ReportStatus  process  of  Fig.  3.3,  the 
entire  TknPool  at  a  member  is  sent  whenever  the  cwnbr  changes.  This  happens  when 
a  member  that  perceives  the  departure  of  its  counter-clockwise  neighbor  establishes  a 
new  counter-clockwise  neighbor  by  querying  it  for  status.  The  new  counter-clockwise 
neighbor  sees  a  change  in  the  member  querying  it,  and  therefore,  sends  its  TknPool 
to  the  new  monitor. 

b.      Ensuring  an  Identical  Sequence  of  Commits 

As  members  perceive  departures/joins  around  the  ring,  they  initiate 
agreement  phases  independently.  Therefore,  in  this  protocol,  it  is  possible  for  multiple 
agreement  phases  to  proceed  simultaneously  around  the  ring  resulting  in  two  commit 
tokens  that  circulate  around  the  ring  at  the  same  time.  The  two  changes  divide  the 
ring  in  two  pieces.  Clearly,  the  order  in  which  these  commits  reach  the  members  in 
these  two  pieces  will  be  opposite.  An  identical  order  is  maintained  in  this  situation, 
as  specified  by  lines  (2  -  12)  of  Fig.  3.5.  When  a  commit  token  is  to  be  generated, 
it  is  first  checked  to  see  if  there  are  any  unprocessed  agreement  tokens  in  the  token 
pool.  If  there  are,  commits  resulting  from  these  are  ordered  identically  around  the 
ring;  otherwise,  a  commit  token  is  generated  and  change  committed  (lines  3-4). 
If  there  are  unprocessed  agreement  tokens  in  the  token  pool,  the  commit  initiator 
determines  if  the  member  for  which  a  commit  is  to  be  initiated  has  the  smallest  rank 
among  all  the  members  for  which  there  are  unprocessed  agreement  tokens  (lines  6  - 
9).  2  It  should  be  remembered  that  the  rank  of  a  member  is  its  distance  from  phost 
in  the  clockwise  direction.  If  the  rank  is  not  the  smallest,  the  local  status  is  marked 
as  pending  (line  11)  and  the  change  is  committed  and  propagated  at  a  later  time. 


2  Agreement  tokens  for  joins  in  the  pool  do  not  matter  because  members  always  join  with  the 
highest  rank. 
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Thus,  use  of  the  rank  ensures  that  all  the  members  commit  in  the  same  order  around 
the  ring.  It  should  be  noted  that  the  pending  status  for  a  change  gets  marked  only 
in  the  commit  initiator. 

C.  CORRECTNESS  OF  THE  PROTOCOL 

We  prove  several  propositions  relating  to  the  correctness  of  the  protocol  pro- 
posed. 

Proposition   1:    No  tokens  are  lost  if  a  member  updates  its   TknPool  using 
CommitChange. 

Proof:  If  pi  receives  commitpj{pk),  it  is  guaranteed  to  have  received  agreePj(pk) 
some  time  previously  because  the  agreement  phase  is  followed  by  the  commit  phase. 
Obviously,  agreePj(pk)  has  circulated  completely  around  the  ring.  Suppose  3  a 
commit Pl{pm)  received  at  pi  before  agreep.(pk).  Thus,  in  between  the  arrivals  of 
commit p,(pm)  and  commitPj(pk)  at  pi,  3  a  token,  viz.,  agreePj(pk)  has  circulated 
around  the  ring  completely.  This  implies  that  commitpi(pm)  has  circulated  around 
the  ring  completely  also,  regardless  of  the  locations  of  p;,  pj,  and  pi  around  the  ring  due 
to  the  FIFO  property  of  channels.  Thus,  commitpi(pm)  has  served  its  purpose  and  can 
be  deleted  from  the  TknPool  at  pi.  Therefore,  both,  agreep(pk)  and  commitPl(pm) 
have  completed  their  use  and  can  be  deleted.  By  adding  commitPj(pk)  to  the  TknPool 
at  p^  its  update  is  complete.  If  this  token  pool  is  sent  to  the  cwnbr(pi)  according  to 
ReportStatus,  no  tokens  will  be  lost.  ■ 

Proposition  2:  Exactly  one  pi  determines  itself  to  be  phost- 
Proof:    CommtChange  determines  a  host  only  when  it  commits  a  departure  for  the 
current  phost-  According  to  the  rule  for  determining  the  new  host,  only  the  local  group 
view  is  inspected  and  the  clockwise  neighbor  of  the  departed  host  is  determined  to 
be  new  phost-  According  to  Proposition  1,  no  tokens  are  lost.  Therefore,  the  commit 
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ProcessAgreementTkn  for  agreePj(pk)  at  pi 

/*A  commit  must  be  generated  either  when  I  am  the 
agreement  initiator  or  when  a  duplicate  token  is  received 
due  to  departure  of  the  agreement  initiator  pj*/ 

1  if  ((pi  =  pj)  ||  {(pj  ^  pi)  kk  (duplicate  token)  kk  (pj  G  STPi) 
1.1    kk  (pi  €  STPi  Vpi  s.t.  pi  -4  Pi))) 

2  if  (no  unprocessed  agreement  token  in  TknPool) 

3  generate  commitPi{pk)\ 

4  Commit  Change; 

5  else 

6  compute  rank  Vpi  G  STPi  with  Agreed  status; 

7  if  (rank(pfc)  is  smallest) 

8  generate  commitPi(pk); 

9  CommitChange; 

10  else 

/* depending  upon  whether  for  join  or  departure  of  pfc*/ 

11  STPi(pk)  <—  D epartureP ending  oi  JoinPending; 

12  end  if; 

13  end  if; 

14  else 

15  if  (((pj  ^  Pi)  kk  (not  a  duplicate  agreePj(pk) 

16  add  agreePj(pk)  to  TknPool; 

17  STPi(pk)  <—  Departure  Agreed  or  Join  Agreed; 

18  send  agreePj(pk)  to  cwnbr(pi); 

19  end  if; 

20  end  if; 

end  ProcessAgreementTkn. 

Figure  3.5:  Protocol  for  agreement  tokens 
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Commit  Change  for  commit  Pj(pk)  at  pi 

/^Depending  on  whether  a  join  or  departure*/ 

1  add  or  delete  pk  from  GV{pi)\ 

2  delete  pk  entry  from  STPi ; 

3  vn(pi)  <—  vn(pi)  +  1; 

4  send  commitPj(pk)  to  cwnbr(pi); 

5  delete  all  commit  tokens  received  before 

agreePj(pk)  from  TknPool; 

6  delete  joinreqPj{pk); 

7  delete  agreePi(pk)\ 

8  add  commitPj(pk)  to  TknPool; 

9  determine  new  p^oat; 

10  if  ((jom  committed)  &&  (/){  =  phost)) 

11  update  acwnbr(pi)] 

12  send  ST(pi),  Tknpool(pi)  and  GV'(pi)  to  the  acwnbr(pi); 

13  end  if; 

end  CommitChange. 

Figure  3.6:  Protocol  for  committing  a  change 


ProcessCommitTkn  for  comm,itPj(pk)  at  pi 

1  if  ( (pi  =  pj)  ||  (duplicate)) 

2  exit; 

3  else 

4  CommitChange; 

5  while  (3  pi  £  STPi  with  a  higher  rank  h  pending  status 

received  before  agreePj(pk)) 

6  CommitChange; 

7  end  while; 

8  end  if; 

end  ProcessCommitTkn. 


Figure  3.7:  Protocol  to  process  a  commit  token 
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token  for  the  departure  of  the  old  host  is  processed  by  every  member.  Since  the  host 
had  rank  0,  which  is  always  the  lowest,  every  member  determines  the  same  member 
as  the  new  phost-  ■ 

Proposition  3:  An  agreement  phase  is  always  started. 
Proof:  In  case  of  a  departure  perceived  by  a  member,  say  pi,  it  may  itself  depart 
before  initiating  the  agreement  token  or  after  sending  it.  In  the  latter  case,  the 
commit  phase  is  carried  out  by  cwnbr{pi).  In  the  former  case,  cwnbr[pi)  perceives  the 
departure  of  pi  and  initiates  an  agreement  phase.  It  attempts  to  monitor  acwnh\pi) 
whose  agreement  p,-  could  not  initiate,  cwnbr(pi)  perceives  acwnbr(pi)  as  departed 
also  and  initiates  an  agreement  phase  for  it.  This  sequence  of  events  is  extended  if 
there  is  a  string  of  departures. 

If  pi  is  the  host  and  fails  before  initiating  the  agreement  phase  for  a  join, 
cwnbr[pi)  determines  itself  to  be  the  new  host  and  receives  the  JoinReceived  as  part 
of  the  TknPool  to  initiate  the  agreement  phase.  Argue  that  no  join  requests  are  lost.  ■ 

Proposition  4:  The  joining  member  and  phost  behave  consistently  after  the 
agreement  initiation. 

Proof:  phost  sends  its  GV,  ST,  TknPool,  andvn  to  the  joining  member  Pnew  The 
exceptions  to  the  rules  to  compute  cwnbr  and  acwnbr  ensure  that  the  logical  ring  is 
correctly  configured  with  pnew  as  the  highest  rank  member.  When  the  acwnbr[pho9t ) 
before  the  join  notices  that  the  querying  member  is  different  from  its  pmon,  it  becomes 
aware  of  the  new  member  in  the  ring  and  sends  it  TknPool  to  it.  Therefore,  all  tokens 
that  are  passed  to  phost  while  the  state  transfer  to  pnew  is  taking  place  are  sent  to 
Pnew  This  ensures  that  pnew  behaves  consistently  with  phost-  ■ 
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Theorem  1:  The  proposed  protocol  correctly  solves  the  GMP  stated  as 

\/pi   E   GVvn{pj)  and\/  n<  vn,GVn(pj)  =  GVn{pi) 

given  that  all  members  start  with  the  same  initial  group  view  {GV0). 
Proof:  We  provide  a  proof  by  induction. 

Base  Case:  Vffi,  Pj  £  GVo(pk),  GV0(pi)  =  GV0(pj)  at  system  initialization. 
Induction  Hypothesis:  Assume  that  3k  >  1    E  N  such  that  Vpi,pj  E  GVk(pj)  GVk(pi)  = 
GVk(Pj). 

We  now  prove  that  the  next  change  committed  by  any  two  members  is  identical. 
Consider  any  Pi,Pj  E  GVk+\{pj).  Without  loss  of  generality,  let  commitPh(pi)  be  the 
next  change  to  be  committed  by  pj.  There  are  two  cases. 

Case  1  -  pj  -4  pi'.  It  is  clear  from  the  change  detection  instruments  that  pj  -A  pt  and 
Pi  Q  pi.  Therefore,  if  a  change  involving  pi  is  view  change  {k  +  1)  committed  at  pj, 
either  the  only  agreement  token  pk  has  at  the  time  of  initiating  commitPk(pi)  is  for  pi 
or  pi  has  the  smallest  rank  among  all  agreement  tokens  in  the  TknPool  at  pk.  Now,  a 
commit  token  initiated  for  pm  such  that  pm  -4  p{  cannot  result  in  view  change  (fc  +  1) 
at  pi  because  this  implies  that  pm  has  a  lower  rank  at  pi  than  pi  whose  agreement 
token  will  be  part  of  the  TknPool  at  pi.  Therefore,  agreement  token  for  pm  would 
also  be  part  of  the  TknPool  at  pk  and  would  have  the  smallest  rank  at  the  time  of 
initiation  of  commitPk(pi).  This  contradicts  the  fact  that  pi  had  the  smallest  rank  at 
Pk  or  was  the  only  agreement  token  at  pj.  Therefore,  view  change  (k  -\-  1)  committed 
at  pi  is  due  to  commit p k(pi) . 

Case  2  -  pi  — >  pji  In  this  case,  commitPk(pi)  that  results  in  view  change  (k  +  1)  at  pj 
must  first  pass  through  pi  since  pi  -4  pj  and  tokens  circulate  in  the  clockwise  direc- 
tion. This  implies  that  view  change  (k  +  1)  at  pi  is  also  due  to  commitPh(pi). 
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Thus,  given  the  induction  hypothesis  for  view  change  k,  we  prove  that 

VPi,Pj  e  GVk+1(Pj)  GVk+1(Pi)  =  GVk+1(Pj) 
This  completes  the  proof  by  induction. 
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IV.  IMPLEMENTATION  OF  THE  PROTOCOL 

In  this  chapter,  the  implementation  aspects  of  our  Group  Membership  Proto- 
col(MP)  are  discussed.  Major  functionalities  of  the  protocol  are  detection  of  failure, 
agreement  of  failure,  committing  of  failure,  addition  of  members,  and  supplying  the 
current  view  to  application  processes.  This  requires  the  protocol  to  communicate  with 
application  processes  executing  on  the  same  member  ,  MP  at  other  members,  and  be 
able  to  act  on  the  information  recieved  from  other  processes.  The  action  taken  by  the 
MP  depends  on  the  data  received.  The  data  that  it  receives  from  and  sends  to  the 
external  world  is  depicted  in  Fig.  4.1.  Based  on  this  data  flow,  the  software  design 
for  the  protocol  was  developed  as  a  set  of  interacting  processes  each  performing  a 
unique  function.  The  design  used  the  utility  Software  Through  Pictures  [Ref.  STP] 
to  visualise  the  interactions  and  to  check  for  consistency  in  the  data  exchanged.  The 
following  paragraphs  give  a  more  in-depth  picture  of  the  implementation  details. 

A.     PROTOCOL  SOFTWARE  DESIGN 

Fig.  4.1  gives  the  interaction  of  Group  Membership  Protocol  with  the  external 
world. 

1.      Functions  in  the  Protocol 

The  Group  Membership  Protocol  will  be  executed  at  all  member  sites  of 
a  process  group.  This  diagram  gives  the  interaction  of  the  MP  with  the  application 
program  executing  in  the  same  member  site  and  the  MP  executing  at  other  member 
sites.  New  Members  and  application  requests  for  current  group  view  are  applica- 
tions executing  on  the  same  host.  Clockwise  and  anticlockwise  neighbor  represent 
MP  execution  at  other  member  sites.  The  protocol  receives  parameters  ( Token  and 
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Neighbor 

New  Members 

Figure  4.1:  Topmost  view  of  MP  interactions 
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TokenPool)  from  other  member  sites  and  updates  its  membership  view  accordingly. 
It  sends  status  query  (Status- Query- out)  to  counter-clockwise  neighbor  and  awaits 
the  reply  from  it  ( Status- Report-in) .  It  receives  status  query  (Status-Query-in)  from 
clockwise  neighbor  and  sends  reply  (Status- Report- out)  as  a  response.  When  it  gets 
a  message  requesting  to  join  the  group  (Join- Requests)  it  acts  on  it  in  an  appropriate 
manner.  Details  are  explained  in  the  lower  level  descriptions.  When  this  member  site 
joins  a  new  group,  it  receives  initial  parameters  (Initial-Parameters-Receive)  and  ini- 
tializes its  parameters(  Group  V^'eu;,  TokenPool,  Status- Table).  An  application  wanting 
the  current  members  of  the  group  ( Group  View)  queries  the  MP  to  get  the  current 
membership. 

2.      Subcomponents  of  MP 

Fig.  4.2  gives  the  overall  view  of  the  various  functions  in  the  MP  software 
running  at  every  member  process.  The  functions  are  implemented  by  a  number  of 
sub-processes.  The  name  of  the  processes  and  their  functions  are  given  in  Table  4.1. 
A  detailed  process  specification  is  given  in  the  lower  level  diagram  corresponding  to 
each  process  defined  in  this  diagram. 

B.     DATA  STRUCTURE  DEFINITIONS 

Different  data  structures  used  for  implementation  and  in  process  specifications 
are  described  below. 

Address  is  a  special  data  type  defined  as  a  long.  It  is  generated  by  the  Unix  system 
calls.  It  uses  the  conventional  Internet  address  and  the  port  address  to  generate 
a  unique  address.  It  is  used  for  communication  with  processes  spread  over 
different  hosts. 

Action-message:  This  is  defined  to  enunciate  several  action-oriented  messages  on 
stored  data.   They  include  add,  remove,  and  update  of  a  linked  list.  The  defi- 
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2  Update-Status 

3  TokenStatus 

4  Current-StatusTable 
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7  Update-TokenPool 

8  Neighbor-Status 


Status-Report-out, 

Token, 

TokenPool 


Status-Q  uery-out, 
Initial-Parameters-Send 
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Token,  Status-Query-in, 

TokenPool  Initial-Parameters-Receive 


Figure  4.2:  MP  process  interactions 
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TABLE  4.1:  PROCESSES  AND  THEIR  FUNCTIONS 


NAME 

FUNCTION 

FIFO-Channel-Layer 

This  process  is  responsible  for  all  the  communica- 

tion that  MP  has  with  other  processes  external  to 
it. 

Initiate-Departure 

This   process   is   responsible   for   monitoring  the 

health  of  the  neighbor  and  initiating  agreement 

phase  if  the  neighbor  fails. 

Initiate-Join 

This  process  receives  join  requests  from  new  mem- 

bers and  acts  on  them  in  an  appropriate  manner. 

Agreement -Process 

This  process  receives  and  processes  the  agreement 

token. 

Commit -Process 

This  process  commits  the  removal  or  addition  of 

a  member  from  the  group  view. 

TokenPool -Manager 

This  process  manages  the  token  pool  .  It  sees  to  it 

that  no  tokens  are  lost  and  there  are  no  duplicate 

tokens. 

StatusTable-Manager 

This  process  manages  the  status  table  and  keeps 

track    of    the    status    of    all    members    in    the 

groupview. 

GroupView-Manager 

This  process  manages  the  group  view  and  updates 

it  when  members  leave  and  join. 

Join-Initial 

This  process  receives  the  initialization  parameters 

and  initializes  the  Status  table,  TokenPool  and 

GroupView  data  structures  in  the  new  member. 
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Act  ion-message 
char[10] 
Address 


Action-type; 
Member- Address; 


Figure  4.3:  Action-type  message  structure 


TABLE  4.2:  DIFFERENT  ACTION  ORIENTED  DATA  STRUCTURES. 


Data  Element 

Action-type 

Initiate- Agreement 

initagree 

Initiate- Commit 

initcomit 

Neighbor-Status 

statmembr 

Send-Init-Param 

sendinitp 

Update- Group  View 

updtgview 

Update-Status-  Table 

rmvemembr 

Update-  TokenPool 

uptknpool 

nition  is  given  in  Fig.   4.3.   The  different  instances  of  occurrances  of  this  data 
type  with  the  action-type  is  given  in  Table  4.2. 

GroupView-struct:  This  is  denned  in  Fig.  4.4.  The  structure  consists  of  View-number, 
Group-Size  as  strings  and  an  array  of  Addresses.  The  size  of  this  array  is  equal 
to  the  value  specified  as  string  in  Group-size.  Initial-GroupView  and  Current- 
Group  View  are  instances  of  this  data  type. 

Neighbor:  This  data  type  is  used  to  define  various  instances  at  which  a  neighbor 
address  is  required.  It  has  two  data  elements,  Initiator-Address  and  Neighbor- 
Address.  The  data  is  used  to  pass  the  address  of  the  clockwise  neighbor  of 
the  initiating  address.  It  is  defined  in  Fig.  4.5.  Neighbor  Address,  Current- 
Neighbor,  and    Neighbor- Member  are  instances  of  this  data  type. 
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GroupView- struct 

char  [5] 
char[3] 

{ 

Address 

}  [atoi( Group-Size)] 


View-Number; 
Group-Size; 

Member- Address; 


Figure  4.4:   GroupView  message  structure 


Neighbor 
Address 
Address 


Initiator-address; 
Neighbor-address; 


Figure  4.5:  Neighbor-Address  structure 

StatusTable  Structure:  This  is  defined  to  specify  the  message  structure  for  sending 
Status  Table  contents.  It  consists  of  a  string  of  characters  denoting  the  Num- 
berOfEntries  and  an  array  of  another  structure  consisting  of  member  address 
and  status  as  elements.  The  size  of  this  array  is  given  by  the  value  specified  in 
number  of  entries.  It  is  defined  in  Fig.  4.6.  Initial-Status-Table  and  Current- 
Status-Table  are  instances  of  this  data  type. 

Token-data:  This  is  defined  to  enunciate  multiple  instances  of  its  occurrance.  This 
structure  consists  of  the  elements  initiator-address,  member-address,  and  token- 


NumberOfEntries; 


StatusTable- struct 
char[3] 

{ 

Address  Member-Address; 

char[10]  Status-of-member; 

}  [atoi(NumberOfEntries)] 


Figure  4.6:   StatusTable  message  structure 
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Token- struct 

Address 

Initiator-address; 

Address 

Member- Address; 

char  [10] 

Token- type; 

Figure  4.7:  Token  structure 


TokenPool- struct 
char  [3] 


{ 
Token-struct 

}  [atoi(NumberOfTokens)] 


NumberOfTokens; 
Token-field; 


Figure  4.8:  TokenPool  message  structure 

type.  The  token-type  is  used  to  distinguish  between  various  tokens.  The  defi- 
nition is  given  in  Fig.  4.7.  The  explanation  of  various  instances  is  given  in  data 
structure  Token  description. 

TokenPool- struct:  This  data  type  consists  of  a  character  string  denoting  number 
of  tokens  and  an  array  of  token-field  of  type  token-struct.  The  array  size  is 
given  by  the  number  of  tokens  field.  The  Fig.  4.8  gives  the  data  definition. 
Initial- TokenPool  and  Current- Token  Pool  are  instances  of  this  data  type. 

GroupView:  This  is  a  linked  list  used  for  storing  the  group  view  at  each  process  site. 
The  data  structure  consists  of  view  number  and  group  size  as  a  character  string 
and  a  linked  list  of  member  address  and  next  member  pointer.  The  tail  of  list 
is  specified  by  null  address  in  next-member  field.  The  data  structure  is  defined 
in  Fig.  4.9. 

Initial-Parameters-Receive:  This  is  a  character  string  formed  by  concatenating  Initial- 
GroupView  ,  Initial-Status- Table  and  Initial- TokenP 'ool  into  a  single  message. 
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GroupView 

char[5] 

View-Number; 

char  [3] 

Group-size; 

Member-pointer{ 

Address 

Member- Address; 

Member-pointer 
} 

*Next-member; 

Figure  4.9:  GroupView 

Initial- Parameters-Send:  This  consists  of  the  destination-address  of  the  message  and 
a  message  formed  by  concatenating  Current-Group  View,  Current-TokenPool 
and  Current-Status-Table. 

Join- Requests:  This  has  requesting  member  address  and  the  name  of  the  group  to  join 
as  components.  The  Request-member-address  is  of  type  address  and  Group- 
name  is  a  character  string. 

Member-Status:  The  components  of  this  data  structure  are  member-address  and  a 
character  string  denoting  the  status  of  member  as  specified  in  status  table. 

Reset-timer:  This  is  a  message  string  "resetimer"  to  reset  the  watchdog  timer  used 
for  taking  periodic  actions. 

Status-message:  This  data  type  is  defined  to  specify  the  data  structure  for  querying 
and  responding  messages.  The  data  definition  is  given  in  Fig.  4.10.  Status- 
Query-in  and  Status-Query- out  have  the  action  field  as  "statquery".  Status- 
Report-in  and  Status- Report- out  have  the  action  field  as  "statreprt". 

Status-Table:  This  data  is  a  linked  list  used  for  storing  the  status  table  at  each  process 
site.  The  data  structure  consists  of  number-of-entries  in  the  list,  and  a  linked  list 
consisting  of  member  address,  member  status  and  next  member  pointer.   Null 
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Status-monitor  message 

char  [10] 

action; 

Address 

Initiator- Address; 

Address 

Destination- Address; 

Figure  4.10:  Status-Monitoring  message  structure 


Status-Table 

char  [3] 

Status-pointer{ 

Address 

NumberOfEntries; 
Member- Address; 

char  [10] 

Member-status; 

status-pointer 
} 

*Next-entry; 

Figure  4.11:  Status-Table 

address  in  next-member  field  denotes  the  tail  of  the  list.  The  data  structure  is 
defined  in  Fig.  4.11.  The  different  entries  of  member  status  are  Departure A- 
greed,  JoinAgreed,  DepartureP  ending,  Departure  Agreed  and  JoinPending. 

Timeout-message:  This  is  a  message  "timeoutms"  to  denote  that  a  time  out  has 
occurred. 

Token:  This  is  an  instance  of  occurrance  of  Token-struct.  Token-type  is  a  string 
denoting  type  of  token.  There  are  5  types  of  tokens  and  Table  4.3  gives  a  list 
of  tokens  and  the  Token-type  corresponding  to  each  of  them. 

TokenPool:  This  data  is  a  linked  list  used  for  storing  the  tokenpool  at  each  process 
site.  The  data  structure  consists  of  number  of  tokens  as  a  character  string  and 
a  linked  list  of  member  address,  token-type  and  next  member  pointer.  The  tail 
of  the  list  is  specified  by  a  null  address  in  the  next-member  field.  The  data 
structure  is  defined  in  Fig.  4.12  • 
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TABLE  4.3:  DIFFERENT  TOKENS  WITH  THEIR  TOKEN-TYPES 


Tokens 

TokenType 

Join  Agreement  Token 
Failure  Agreement  Token 
Join  Commit  Token 
Failure  Commit  Token 
Join  Requested  Token 

Joinagree 

Failagree 

Joincomit 

Failcomit 

Joinreqst 

TokenPool 

char  [3] 

NumberOfTokens; 

Token-pointer{ 
Address 

Member- Address ; 

char[10] 
Token-pointer 

} 

Token-type; 
*Next-token; 

Figure  4.12:  TokenPool 
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TokenStatus:  This  is  a  data  structure  consisting  of  token-struct  element  specifying 
particular  token  and  a  character  string  giving  the  status  of  the  token  as  old  or 
new. 

Update-Status:  This  is  a  data  structure  consisting  of  the  address  of  the  member  spec- 
ifying the  member-address  and  a  character  string  giving  the  new  status  of  the 
member. 

C.      PROCESS  SPECIFICATIONS 

The  individual  processes  are  described  in  great  detail  in  the  following  para- 
graphs. The  function  of  each  process,  along  with  its  inputs  and  outputs,  is  described. 
The  shared  data  managed,  if  any  is  also  specified.  The  algorithm  used  for  implement- 
ing the  function  is  also  described. 

1.      FIFO-Channel-Layer 

This  process  is  responsible  for  all  the  communication  with  all  the  processes 
external  to  the  MP  executing  in  the  member  site.  It  receives  Status-query-out  from 
Initiate-Departure  process  and  sends  Status- Query- out  to  the  counter-clockwise 
neighbor.  It  receives  Status- Report- in  from  counter-clockwise  neighbor  and  sends 
it  to  Initiate-departure  process.  On  receiving  a  Status-Query-in  from  clockwise 
neighbor  it  sends  it  to  Initiate-Departure  Process.  It  receives  Status-report- out 
from  Initiate-Departure  process  and  sends  Status- Report-out  to  the  site  address 
specified.  If  the  member  site  is  the  host  of  the  group,  the  Initialization- Parameters- 
send  is  sent  to  the  new  member  wanting  to  join  the  process  group.  If  the  member  site 
is  the  new  member  wishing  to  join  the  group,  this  process  receives  Initial- Parameters- 
receive  and  initializes  the  storage  elements  in  the  MP  protocol.  Fig.  4.13  gives  the 
interaction  of  FIFO-Channel-Layer  process  with  other  processes. 
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Initiate-   Agreement -Pro cess 
Join 


Join-Initial 


Status-Query-in, 
Initial-Parameters- 
Receive 


Status-Query-in, 
Status-Report-in 
TokenPool 


Initiate-Departure 


Figure  4.13:  FIFO  channel  process 
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Send  process 

1  while  (true) 

2  Receive  message  from  message  queue; 

3  Extract  the  destination  from  the  message; 

4  Send  the  message  to  destination  specified; 

5  end  while; 
end  Send. 


Figure  4.14:  Send  process 

This  process  is  subdivided  into  Send  and  Receive  process.  Send  process  has 
a  message  queue.  All  processes  wishing  to  communicate  with  the  processes  external  to 
MP,  but  executing  locally,  send  messages  to  the  message  queue.  The  destination  part 
of  the  message  will  specify  the  destination  to  be  sent  to.  Receive  process  receives 
message  streams  from  other  members.  The  type  of  message  sent  by  each  member  is 
embedded  in  the  message.  This  process  scans  the  type  of  message  sent  and  sends  the 
message  stream  to  the  appropriate  process.  Send  process  has  12  data  flows. 

Input  data  flows  are  Status- Report- out,   Token,  TokenPool,  Status-Query-out,   To- 
ken and  TokenPool 

Output  data  flows  are  Status-Report-out,  Token,  TokenPool,  Status-Query-out,  To- 
ken and  TokenPool 

The  algorithm  used  for  implementing  the  process  is  given  in  Fig.    4.14.    Receive 
process  has  9  data  flows. 

Input  data  flows  are  Status- Report- in,    Token,    TokenPool,    Status-Query-in   and 
Initial- Parameters-receive 

Output  data  flows  are  Status-Query-in,  Status- Report-in,  Token  and  Initial- Parameters- 
receive 
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Receive  process 

1  while  (true) 

2  Receive  message  from  other  members; 

3  Extract  the  type  of  message  received; 

4  Identify  the  destination  process  ; 

5  Pass  the  message  to  the  destination  process; 

6  end  while; 
end  Receive. 


Figure  4.15:   Receive  process 

The  algorithm  for  implementing  the  Receive  function  is  given  in  Fig.  4.15. 
2.      Initiate-Departure 

This  process  checks  for  the  health  of  its  counter-clockwise  neighbor  by 
processing  the  Status  report  received  from  it.  It  keeps  track  of  the  time  elapsed  from 
the  last  query  sent  to  the  counter-clockwise  neighbor.  If  the  elapsed  time  is  greater 
than  a  threshold  it  takes  the  following  actions. 

1.  It  initiates  the  agreement  token  for  the  process  perceived  to  have  failed. 

2.  It  updates  the  address  of  the  process  to  which  query  is  to  be  sent  based  on 
Group  View  entries  and  Status-Table  entries. 

3.  It  updates  the  local  Status-Table  and  TokenPool. 

If  it  receives  a  query  from  a  process  other  than  its  neighbor,  it  updates  the  status  of 
neighbor  and  does  the  following  actions. 

1.  It  sends  status  report  to  the  new  querying  process. 

2.  It  sends  the  Current- TokenPool  to  the  new  querying  process. 

Fig.  4.16  gives  the  interaction  of  Initiate-Departure  process  with  other 
processes.     Initiate-Departure  process  is  divided  into  3  sub  processes.     They  are 
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TokenPool 
Manager 


Token 


2.2 

Agreement 
TokenPooi     initiator 
StatusTable 
Manager 


Status-Query-in, 
Status-Report-in 


TokenPool, 
Token 


Group View  Manager 


Figure  4.16:  Initiate-Departure  process 
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Failure-Monitor, Agreement-Initiator,  and  Timing-process.  Failure-Monitor 
does  the  processing  for  failure  detection.  It  receives  status  query  from  FIFO-Channel-Layer 
process  and  checks  to  see  if  the  querying  process  address  is  the  same  as  the  previous 
address.  If  it  is  the  same,  it  sends  status  report  to  the  address  specified.  If  it  is  differ- 
ent, it  updates  the  new  monitoring  member  as  its  clockwise  neighbor.  This  process 
periodically  queries  the  counter-clockwise  neighbor  and  waits  for  a  status  report.  If 
the  status  report  is  not  received  within  a  certain  time,  it  shuts  off  communication 
from  that  process  .  It  sends  an  Initiate-Agreement  signal  to  Agreement-Initiator 
process.  It  then  updates  its  neighbor  from  GroupView-Manager  and  sends  query  to 
the  new  process. 

Agreement-Initiator  process  initiates  an  agreement  process  when  the 
counter-clockwise  neighbor  is  perceived  to  have  failed.  It  receives  a  message  from 
Failure-Monitor  process  when  a  process  has  failed.  It  indicates  the  address  of  the 
member  perceived  to  have  failed.  It  adds  the  agree  token  to  the  TokenPool  and 
sends  the  token  to  the  new  member.  Timing-process  process  keeps  track  of  time  for 
periodic  actions.  It  signals  if  a  timeout  has  occurred  since  the  previous  status  report. 
It  resets  the  timer  at  the  receipt  of  the  reset  timer  signal.  Failure-Monitor  process 
has  10  data  flows. 

Input  data  flows  are  Status- Query-in,  Status- Report-in  ,  Member- Status,  Neighb- 
orAddress,  Timeout-message 

Output  data  flows  are  Status-Query-out,  Status-Report-out,  Update-Status,  Initiate- 
Agreement,  Reset-timer 

The  process  specification  is  given  in  Fig.  4.17.  Agreement-Initiator  process  has  6 
data  flows. 

Input  data  flows  are  TokenPool,  Initiate  Agreement 
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Failure-Monitor  process 

1  while  (true) 

2  Read  message  from  FIFO-Channel-Layer  process; 

/*  Formal  algorithm  is  given  in  Figs.  2  and  3  of  [Ref.  ShDr]  */ 

3  if  (message  ==  Status- Re  port-in) 

4  wait  till  Timeout-message  is  received;  /*  line  11  */ 

5  reset  timer 

6  send  Status-Query-out; 

7  else  if  (message  ==  Status- Query-in) 

8  check  Initiator- Address; 

9  if  (Initiator- Address  ==  Pmon) 

10  send  Status- Report- out  to  Initiator- Address] 

11  else 

12  send  Status- Report-out  to  Initiator- Address; 

13  send  TokenPool  to  Initiator- Address; 

14  Pmon  —  Initiator-  Address; 

15  end  if 

16  else  if  (message  ==  Timeout-message) 

17  Agreement -Initiate  process  gets  Initiate- Agreement  message; 

18  get  new  neighbor  address  from  GroupView-Manager; 

19  send  Status-Query-out  to  new  counter-clockwise  neighbor; 

20  reset  timer; 

21  end  if 

22  end  while; 

end  Failure-Monitor. 


Figure  4.17:  Failure-Monitor 
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Agreement-Initiator  process 

1  while  (true) 

2  wait  for  Initiate-Agreement  message; 

3  read  address  of  failed  member; 

4  read  address  of  new  neighbor; 

/*  acwnbr(pi)  is  failed  member.  */ 

/*  cwnbr(pi)  is  new  monitor.  */ 

send  agreePi(failed-member)  to  new  neighbor; 

send  Token  to  TokenPooh 

update  status  as  Departure  Agreed; 

8     end  while; 

end  Agreement-Initiator. 


Figure  4.18:  Initiate-Agreement  process 
Output  data  flows  are  TokenPool,  Token,   Token,  Update-Status 

The  process  specification  of  this  process  is  given  in  Fig.  4.18.  The  actions  performed 
are  given  in  a  Pseudo-code  form  in  lines  6-10  of  Fig.  2.  in  the  [Ref.  ShDr]. 

Timing-process  process  has  2  data  flows.  Input  data  flow  is  Reset-timer 
and  Output  data  flow  is  Timeout-message.  The  process  specification  is  given  in  Fig. 
4.19. 

3.      Initiate-Join 

This  process  does  all  the  steps  involved  in  the  process  of  adding  a  new 
member.  If  this  process  receives  a  Join-Request  and  the  process  is  in  the  host,  or  if 
it  receives  Send-Init-Param  message,  then  it  performs  the  following  actions. 

1.  It  receives  the  Current-Group  View  from  the  GroupView-Manager. 

2.  It  receives  the  Current- StatusTable  from  StatusTable-Manager. 

3.  It  receives  the  Current-  TokenF 'ool  from  the  TokenPool-Manager. 
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Timing-process  process 

1 

while  (true) 

2 

wait  for  timer-overflow  or  reset-timer  message; 

3 

if  timer-overflow 

4 

send  Timeout-message; 

5 

disable  timer; 

6 

else 

7 

wait  for  timer-overflow; 

8 

reset  timer; 

9 

enable  timer; 

10 

send  timeout  message; 

11 

end  if; 

12 

end  while; 

end  Timing-process. 

Figure  4.19:  Timing  process 

4.  It  sends  all  of  the  above  information  to  the  new  member  in  a  consolidated 
message. 

If  a  Join- Requests  is  received  and  it  is  not  in  host  member  it  sends  a  Joinreqst  token 
to  clockwise  neighbor  and  adds  the  token  to  the  TokenPooi  This  process  has  8  data 
flows. 


Input  data  flows  are  Join- Requests,  Current- TokenPooi,  Current-Status-Table,  Group  View, 
Send-init-param 

Output  data  flows  are  Initial-Parameters-Send,   Token,  Token 

The  process  specification  is  given  in  Fig.  4.20. 
4.      Agreement  Process 

This  process  does  the  agreement  token  processing.  Whenever  it  receives  a 
token  it  checks  to  see  token  type.  If  it  is  a  commit  token  it  sends  a  Commit- Initiate 
message  to  the  Commit-Process.    If  it  is  an  agreement  token  it  does  the  following 
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Initiate-Join  process 

1  while  (true) 

2  wait  for  Join- Requests  or  Send-Init-Param: 

3  if  ((Join- Requests  ——  true)  and  (host-process)) 

4  send  Joinagree  to  clockwise  neighbor; 

5  update  Status-Table  and  add  Token  to  TokenPool; 

6  else 

7  send  Joinreqst  to  clockwise  neighbor; 

8  end  if; 

9  if  ((Send-Init-Parm  ==  true) 

10  get  Current-  TokenPool  from  TokenPool-Manager; 

11  get  Current-Group  View  from  Group  View- Manager, 

12  get  Current-StatusTable  from  StatusTable- Manager, 

13  convert  all  these  messages  to  Initial-Parameters-Send  message. 

14  send  the  message  to  the  new  member. 

15  end  if; 

16  end  while; 

end  Initiate- Join. 


Figure  4.20:  Initiate-Join  process 
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actions. 

1.  Checks  if  it  is  a  duplicate  token. 

2.  Update  TokenPool  if  it  is  not  a  duplicate  token. 

3.  If  token  is  a  duplicate,  it  checks  to  see  if  the  current  process  has  to  initiate 
commit,  based  on  entries  in  the  Group  View  and  Status-Table. 

If  it  is  a  join  request  token,  it  does  the  following  actions. 

1.  Checks  if  it  is  the  host  process. 

2.  Sends  the  token  to  neighbor  and  add  to  TokenPool  if  it  is  not  host  process. 

3.  If  it  is  the  host,  initialize  the  joining  process  by  sending  Joinagree  token  to  the 
clockwise  neighbor.  Update  Status-Table  and  add  Token  to  TokenPool. 

This  process  has  9  data  flows. 

Input  data  flows  are  Token,  Current-Neighbor,  TokenStatus,  Neighbor-Status 

Output  data  flows   are   Token,   Neighbor-member,    Token,   Initiate- Commit,  Send- 
init-param 

The  formal  algorithm  is  given  in  Fig.  5  of  [Ref.  ShDr]  .  The  process  specification  is 
given  in  Fig.  4.21. 

5.      Commit  Process 

The  commit  process  is  responsible  for  committing  the  removal  or  the  joining 
of  the  member.  This  process  waits  for  the  initiate  commit  message  specifying  the 
address  and  the  action  to  be  performed.  The  functions  performed  by  this  process  are 

1.  Receive  Initiate-Commit  from  Agreement-Process. 
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Agreement-Process  at  pi 

1  while  (true) 

2  wait  for  Token; 

3  if  (  Token-type  ==  (Joincomit  or  Failcomit)) 

4  send  Initiate-Commit  to  Commit -Process; 

5  •     else  if  (Token-type  —  —  Joinreqst) 

6  if  (host-process) 

7  send  Joinagree  to  clockwise-neighbor; 

8  update  Status-Table  and  add  Token  to  TokenPool; 

9  else 

10  send  Joinreqst  to  clockwise-neighbor; 

11  add  Joinreqst  to  TokenPool; 

12  endif; 

13  else  if  (  Token-type  ==  Joinagree  or  Failagree) 

14  if  ((Token  —  £ype  —  —  Joinagree)  and  (Member  —  address  is  in  GroupVieu;)) 
14.1  exit; 

15  if  (  TokenStatus  ——  old) 

16  if  (  Initiator- address  =—  self-address) 

16.1  ((status  of    Initiator- address  ==  not  operational  ) 

16.2  &&  (status  of  all  members  between  initiator  and  self  is  non  operational)) 

17  if  more  tokens  present 

18  compare  rank  of  all  agreement  tokens; 

19  if  (rank  of  self  is  minimum)  initiate  commit; 

20  else  update  status  as  JoinPending  or  DepartureP ending 

21  end  if; 

22  else  initiate  commit  for  agreement  process 

23  end  if; 

24  end  if; 

25  else 

26  add  Token  to  TokenPool  and  send  Token  to  clockwise  neighbor; 

27  update  status  in  Status-Table; 

28  end  if; 

29  end  if; 

30  end  while; 

end  Agreement-Process. 

Figure  4.21:  Agreement  process 
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2.  Check  to  see  if  it  is  a  duplicate  token. 

3.  If  it  is  not  a  duplicate  token  then  purge  all  commit  tokens  before  the  agreement 
token  of  the  member  being  committed. 

4.  Update  the  status  table  entry  by  deleting  entry  corresponding  to  the  member 
leaving. 

5.  Update  GroupView  ,  and  ViewNumber  in  the  Groupview  by  deleting  member  if 
Failcomit  and  by  adding  member  if  JoinComit. 

This  process  has  4  data  flows. 

Input  data  flows  are  Initiate-Commit 

Output  data  flows  are  Update-GroupView,  Update-Status-Table,  Update-TokenPool 

The  formal  algorithm  is  given  in  Fig.  6  of  [Ref.  ShDr]  .  The  process  specification  is 
given  in  Fig.  4.22. 

6.      TokenPool  Manager 

This  Process  manages  the  TokenPool  which  keeps  a  record  of  all  tokens  sent. 
It  maintains  and  manages  a  linked  list  of  token  entities  in  a  client-server  relationship. 
The  service  is  requested  by  the  client  process  by  writing  a  message  onto  a  message 
queue.  The  service  requested  is  embedded  in  the  message.  It  does  the  following 
functions,  depending  on  the  service  required  by  the  client  process. 

1.  It  adds  a  Token  to  the  tail  of  the  list. 

2.  Removes  all  commit  tokens  occurring  before  a  particular  agreement  token  and 
adds  the  commit  token  to  the  tail  of  the  list. 
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Commit-Process  at  pi 

1  while  (true) 

2  wait  for  Token; 

3  if  (  TokenStatus  ==  New); 

4  send  Update-Group  View  message  to  GroupView-Manager; 

5  send  Update-Status- Table  message  to  Status-Table-Manager; 

6  send  Update-TokenPool  message  to  TokenPool-Manager; 

7  send  token  to  clockwise-neighbor  process; 

8  update  host-process  address  and  counter-clockwise  neighbor  address; 

9  if  ((host-process)  and  (Joincomit)) 

10  send  Send-Init-Param  message  to  Initiate- Join  process; 

11  endif; 

12  if  (DeparturePending  or  J oinP ending) 

13  if  (rank  of  pending  member  is  minimum) 

14  initiate-commit; 

15  end  if; 

16  end  if; 

17  end  if; 

18  end  while; 

end  Commit-Process. 


Figure  4.22:  Commit  process 


59 


TokenPool-Manager  process 

1  while  (true) 

2  wait  for  message; 

3  if  (message- type  ==  Tokenstat) 

4  send  status  of  token  to  client  process; 

5  else  if  (message- type  ==  Tkpoolreq) 

6  send  contents  of  TokenPool  to  client  process; 

7  else  if  (message- type  ==  Addtoken) 

8  add  token  to  the  end  of  TokenPool  ; 

9  else  if  (message-type  ==  inittknpl) 

10  extract  tokens  from  message  and  create  TokenPool  ; 

11  else  if  (message-type  ==  uptknpool) 

12  remove  all  commit  tokens  before  agreement  token 

13  remove  Joinreqst  token  for  member  being  committed 

14  remove  agreement  token  and  add  commit  token  in  the  TokenPool  ; 

14  end  if; 

15  end  while; 

end  TokenPool-Manager. 


Figure  4.23:  TokenPool-Manager  process 

3.  Initialises  the  TokenPool  list  based  on  the  information  received  from  the  mes- 
sage. 

4.  Sends  the  content  of  the  TokenPool  to  the  requesting  client. 

5.  Give  the  status  of  token  as  New  or  Old. 
This  process  has  8  data  flows. 

Input  data  flows  are  Token,  Token,   Token,  Initial- TokenPool,   Update-TokenPool 
Output  data  flows  are  Current-TokenPool,   TokenStatus,   TokenPool 
The  process  specification  is  given  in  Fig.  4.23. 
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7.  StatusTable  Manager 

StatusTable  Manager  keeps  track  of  the  status  of  the  various  members. 
The  process  acts  as  a  server  providing  service  to  various  clients.  Depending  on  a 
client's  request  it  does  the  following  functions. 

1.  It  sees  if  a  member  is  listed  in  the  Status-Table. 

2.  It  gives  the  current  status  of  the  member  in  the  Status-Table. 

3.  It  updates  the  status  of  the  member  in  the  Status-Table. 

4.  It  creates  a  message  of  all  the  members  of  the  table  with  their  current  status 
for  new  members. 

5.  It  creates  a  new  Status-Table  from  the  message  received  from  host. 
This  process  has  5  data  flows. 

Input  data  flows  are  Update- Status- Table,   Update-Status,  Initial- Status- Table 
Output  data  flows  are  Member-Status,  Current-Status-Table 

The  process  specification  is  given  in  Fig.  4.24 

8.  GroupView  Manager 

GroupView-Manager  manages  the  membership  list  and  the  view  number. 
It  interacts  with  other  processes  in  a  client  server  relationship  and  does  the  following 
functions  depending  on  client's  request. 

1.  Check  if  a  given  member  is  in  the  GroupView. 

2.  Add  a  new  member  to  the  end  of  the  group. 

3.  Delete  a  given  member  from  the  group  and  maintain  the  logical  ring. 
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StatusTable-Manager  process 

1  while  (true) 

2  wait  for  message; 

3  if  (message-type  ==  Statmembr) 

4  send  status  of  member  to  client  process; 

5  else  if  (message-type  ==  statblreq) 

6  send  contents  of  Status-Table  to  client  process; 

7  else  if  (message-type  ==  Updtstatus) 

8  change  status  of  member  and  add  if  not  present  already  ; 

9  else  if  (message-type  ==  inittable) 

10  extract  Status-Table  from  message  and  create  Status-Table  ; 

11  else  if  (message-type  =  =  Removmem) 

12  remove  status-table  entry  of  member  from  Status-Table; 

13  end  if; 

14  end  while; 

end  StatusTable-Manager. 


Figure  4.24:  StatusTable-Manager  process 

4.  Send  all  the  members  of  the  current  view  to  the  joining  member  if  the  process 
is  a  host. 

5.  Create  the  GroupView  from  the  message  received  from  the  host. 
This  process  has  7  data  flows. 

Input  data  flows  are  Neighbor-member,  Initial- Group  View,  Update- Group  View 

Output  data  flows  are  GroupView,  Current-Neighbor,  Current- Group  View,  Neighb- 
or Address 

The  process  specification  is  given  in  Fig.  4.25. 
9.      Join  Initial 

This  process  does  all  the  initialization  when  the  process  joins  a  new  group. 
It  receives  Initial-parameters-receive  horn  FIFO-channel-Layer  process  and  extracts 
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GroupView-Manager  process 

1  while  (true) 

2  wait  for  message; 

3  if  (message-type  ==  Updtgview) 

4  add  or  remove  member  and  increment  ViewNumber, 

5  else  if  (message- type  ==gpviewreq) 

6  send  contents  of  group  view  to  client  process; 

7  else  if  (message-type  ==neibraddr) 

8  send  the  address  of  clockwise  neighbor  of  member; 

9  else  if  (message-type  ==  initgview) 

10  extract  GroupView  from  message  and  create  GroupView  ; 

11  end  if; 

12  end  while; 

end  GroupView-Manager. 


Figure  4.25:   GroupView-Manager  process 

Initial-TokenPool,  Initial-Status- Table  and  Initial-Group  View  from  the  message.    It 
sends  Initial-TokenPool  as  a  message  to  TokenPool-Manager.  It  sends  Initial-Group  View 
as  a  message  to  GroupView-Manager.  It  sends  Initial-Status-Table  to  StatusTable-Manager. 

D.     IMPLEMENTATION  ON  UNIX  MACHINES 

In  this  section,  some  of  the  communication  protocols  that  can  be  used  for  cre- 
ating a  ring  of  First-In-First-Out  communication  channels  are  discussed.  A  ring  of 
FIFO  channels  is  created  by  logically  ordering  the  group  members  in  a  ring  and  in- 
terconnecting them  through  FIFO  communication  channels.  The  relative  merits  and 
problems  of  various  inter  process  communication  (IPC)  protocols  available  in  UNIX 
are  studied  .  Since  the  implementation  of  a  ring  of  FIFO  communication  channels 
requires  interaction  of  two  or  more  processes,  study  of  various  methods  for  commu- 
nicating between  processes  is  necessary.  In  UNIX,  there  are  different  methods  for 
communication  and  we  discuss  each  one  of  them  and  see  how  these  protocols  can  be 
used  in  implementing  the  algorithm,  based  on  a  logical  ring  of  members  in  a  group. 
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This  sort  of  communication  is  not  limited  to  only  communication  between  two  sys- 
tems but  also  processes  on  a  single  system.  We  deal  with  the  following  types  of  IPC's 
for  intra-machine  communication. 

•  Pipes 

•  FIFO's  (named  pipes) 

•  Message  Queues 

We  will  be  dealing  with  the  following  IPC's  for  inter-host  communication 

•  Sockets 

•  Transport  Layer  Interfaces  (TLI). 

We  deal  with  all  IPC  techniques  for  setting  up  a  client-server  relationship  and  deal 
with  all  types  of  IPC's  for  the  same  host  and  discuss  their  relative  merits  and  prob- 
lems. 

1.      Pipes 

Pipes  [Ref.  SR90]  [Ref.  Roch]  [Ref.  CM89]  are  provided  by  all  flavors  of 
UNIX.  A  pipe  provides  a  one-way  flow  of  data.  A  pipe  is  created  by  the  pipe  system 
call. 

int  pipe(int  *filedes); 
Two  file  descriptors  are  returned-  JiledesfO]  which  is  open  for  reading  and  filedesflj 
which  is  open  for  writing.  Pipes  are  of  little  use  within  a  single  process.  Pipes  are 
typically  used  to  communicate  between  two  different  process  in  the  following  way. 
First,  a  process  creates  a  pipe  and  then  forks  to  create  a  copy  of  itself.  Next  the 
parent  process  closes  the  read  end  of  the  pipe  and  the  child  closes  the  write  end  of 
the  pipe.  This  provides  a  one  way  flow  of  data  between  processes.  For  a  two  way  flow 
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of  data,  two  pipes  are  to  be  created  and  one  is  used  for  each  direction.   The  actual 
steps  are  given  below. [Ref.  SR90] 

•  Create  pipel,  create  pipe2, 

•  fork, 

•  parent  closes  read  end  of  pipel, 

•  parent  closes  write  end  of  pipe'2 

•  child  closes  write  end  of  pipel, 

•  child  closes  read  end  of  pipe'2. 

The  biggest  disadvantage  with  pipes  is  that  they  can  only  be  used  between  processes 
that  have  a  parent  process  in  common.  This  is  because  a  pipe  is  passed  from  one 
process  to  another  through  the  Fork  system  call  and  the  fact  that  all  open  files  are 
shared  by  the  parent  and  the  child  after  a  Fork  .  There  is  no  way  for  two  totally 
unrelated  processes  to  create  a  pipe  between  them  and  use  it  for  IPC. 
2.      FIFOs 

FIFO  [Ref.  SR90,  Roch,  CM89]  stands  for  First  In,  First  Out  .  A  Unix 
FIFO  is  similar  to  a  pipe.  It  is  a  one  way  flow  of  data  with  the  first  byte  written  to  it 
being  the  first  byte  read  from  it.  Unlike  the  pipes  FIFOs  have  a  name  attached  to  it, 
allowing  unrelated  processes  to  access  a  single  FIFO.  FIFO  is  created  by  the  mknod 
system  call. 

int  mknod(char  ^pathname,  int  mode,  int  dev  ); 
The  pathname  is  a  normal  Unix  pathname  and  this  is  the  name  of  FIFO.  The  mode 
argument  specifies  the  file  mode  access  mode  for  the  file  (read,  write  permissions  for 
owner,  group,  world).    The  dev  argument  is  ignored  for  a  FIFO.  Once  the  FIFO  is 
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created  it  must  be  opened  for  reading  and  writing  using  the  open  system  call.  Three 
system  calls  are  required  for  creating  and  opening  FIFOs  for  reading  and  writing  . 
The  sequence  of  actions  involved  are 

•  Create  FIFO 

•  Open  FIFO  for  reading  and  get  file  descriptor  for  reading 

•  Open  FIFO  for  writing  and  get  file  descriptor  for  writing 

Only  one  command  does  the  same  thing  for  pipes.  One  of  the  rules  followed  by  pipes 
or  FIFOs  is  that  write  is  guaranteed  to  be  atomic  if  the  write  is  less  than  the  capacity 
of  a  pipe  or  FIFO.  The  capacity  is  greater  than  4  kbytes.  If  it  is  greater  then  there  is 
a  possibility  of  data  and  atomicity  is  not  guaranteed.  There  is  some  care  to  be  taken 
in  the  order  of  open  calls  to  avoid  a  deadlock  condition.  When  the  client  opens  FIF01 
for  writing,  it  waits  for  the  server  program  to  open  FIFOl  for  reading.  If  the  first 
call  of  server  is  for  FIF02  instead  of  FIFOl,  each  process  would  be  waiting  for  the 
other,  and  neither  would  proceed.  This  leads  to  a  deadlock.  One  of  the  disadvantages 
of  pipes  and  FIFOs  are  that  they  are  stream  I/O  models.  The  message  boundaries 
are  delineated  with  the  newline  character  and  it  is  not  possible  to  have  structured 
messages. 

3.      Message  Queues 

Message  Queues  [Ref.  SR90,  Roch]  are  used  to  pass  messages  between 
processes  in  System  V  implementation.  Processes  read  and  write  to  arbitrary  queues. 
There  is  no  requirement  that  any  process  be  waiting  to  read  before  some  other  process 
is  allowed  to  write  a  message  to  that  queue.  This  is  unlike  the  case  of  pipes  and  FIFOs. 
It  is  possible  for  a  process  to  write  a  message  in  the  queue  and  exit  and  have  another 
process  read  the  messages  at  a  later  time.  Each  message  on  a  queue  has  the  following 
attributes: 
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Message  buffer  structure 
struct  msgbuf  { 

long  mtype;  *  message  type  is  greater  than  zero  * 

char[]  mtext;  *  message  data  * 

}      


Figure  4.26:  Message  queue  structure 

•  long  integer  type] 

•  length  of  the  data  portion  of  the  message. 

•  data(\{  the  length  is  greater  than  zero). 

The  message  queue  can  be  thought  of  as  a  linked  list  of  messages.  A  new  message 
queue  is  created  or  the  old  one  accessed  using  the  msgget  system  call.  The  value 
returned  by  msgget  is  the  queue  identifier  msqid.  Once  a  message  queue  is  opened, 
we  put  the  message  in  the  queue  using  the  msgsnd  system  call. 

int  msgsnd('mt  msqid,  struct  msgbuf  *p£r,  int  length,  int  flags); 
The  ptr  argument  points  to  a  structure  with  the  following  template.  Fig.  4.26  specifies 
the  structure  of  a  message  in  a  message  queue.  Message  type  must  be  greater  than 
zero  since  it  is  used  by  msgrcv  as  a  special  indicator  to  get  messages  of  that  particular 
type  only.  This  is  very  useful  in  multiplexing  messages.  One  way  of  multiplexing  a 
single  server  with  multiple  clients,  is  to  have  one  message  type  for  communication 
from  clients  to  server  and  to  embedd  in  the  message,  the  type  of  message  the  client 
will  respond  to.  For  example,  the  message  type  for  client  to  server  could  be  1  and  the 
client  will  include  their  process  id  in  the  message.  The  server  will  use  this  process  id 
and  use  it  as  the  type  when  sending  messages  to  that  client.  The  client  will  receive 
only  messages  specified  by  its  process  id  by  specifying  its  process  id  as  the  type  of 
message  it  wants  to  receive.  We  now  analyze  a  program  where  a  server  gets  messages 
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from  4  clients  on  a  message  queue  and  processes  their  requests  and  replies  to  them 
over  the  queue. 

In  this  program  the  server  maintains  a  linked  list  of  the  members  of  a 
Groupview.  There  are  four  clients  who  can  do  the  following  functions. 

•  Initialise  the  Membership  list. 

•  Add  a  member  to  the  list. 

•  Remove  a  member  from  the  list. 

•  Request  the  list  of  members  in  the  current  list. 

The  server  gets  this  information  from  these  clients  and  uses  their  process-ids  to  send 
the  message  back  to  them.  The  server  waits  for  messages  from  clients  and  acts  on 
them  as  they  arrive.  When  there  is  more  than  one  message  they  are  acted  on  the 
order  of  arrival.  The  server  will  always  be  waiting  in  the  msgrcv  system  call.  All  the 
IPCs  discussed  till  now  deal  with  communication  only  within  the  same  host.  Now  we 
deal  with  the  methods  of  communicating  over  different  hosts. 
4.      Sockets 

Sockets  [Ref.  SR90,  CM89]  are  basically  used  for  Network  I/O  as  opposed 
to  file  I/O  in  the  same  machine.  This  needs  more  details  and  options.  For  example 
the  details  and  options  that  would  be  necessary  are  given  briefly  in  a  few  sentences. 
Typical  client-server  relationship  is  not  symetrical,  i.e.  the  actions  to  be  performed 
by  a  client  are  different  from  the  actions  to  be  performed  by  servers.  To  initiate  a 
connection  request,  the  program  must  know  which  role  it  is  to  play.  The  network 
connection  could  be  connection  oriented  or  connectionless  and  each  has  a  different 
sequence  of  actions  to  perform.    The  names  are  important  in  networking  because 
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verification  of  authority  for  requested  services  should  be  possible.  For  network  proto- 
cols, message  boundaries  have  a  lot  of  significance.  We  deal  mainly  with  connection 
oriented  networks  since  we  are  interested  in  a  FIFO  channel  which  is  not  guaranteed 
by  connectionless  protocols.  The  transport  is  based  on  TCP  protocols. 

To  do  Network  I/O,  the  first  thing  a  process  should  do  is  to  call  the  socket 
system  call  specifying  the  type  of  communication  protocol  required.  The  socket  could 
be 

•  stream  socket, (connection  oriented  protocol) 

•  datagram  socket,  (connectionless  service) 

•  raw  socket, 

•  sequenced  packet  socket. (  more  than  one  message  sent  with  sequence  numbers) 

This  call  returns  an  integer  similar  to  a  file  descriptor  called  sockfd.  The  bind  sys- 
tem call  binds  the  local  address  and  local  process  for  a  connection  oriented  server. 
listen()  and  accept()  system  calls  are  used  for  foreign  address  and  foreign  process  in 
a  connection  oriented  server.  connect()  system  call  is  used  by  connection  oriented 
clients.  The  client  knows  the  socket  address  by  binding  of  the  address  by  the  server. 
The  server  address  is  known  to  the  client  and  the  client  knows  the  port  number  that 
the  server  uses  for  socket  connection. 

5.      Transport  Layer  Interface 

Transport  Layer  Interface(TLI)  [Ref.  SR90,  SUN]  provide  an  interface  to 
the  transport  layer  of  the  OSI  model.  It  is  a  set  of  library  functions  that  hide  the  actual 
streams  interface  to  the  networking  system.  Two  processes  that  are  communicating 
are  called  transport  endpoints  in  TLI.  The  transport  provider  is  a  set  of  routines  in 
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the  host  computer  that  provide  communication  support  to  the  user  process.  Some  of 
the  elementary  functions  in  TLI  are 

•  t-open  which  is  used  to  establish  a  transport  endpoint  by  specifying  the  partic- 
ular transport  provider. 

•  t-bind  assigns  an  address  to  the  transport  endpoint. 

•  t- alloc  allocates  space  for  various  data  structures  used  in  all  the  TLI  functions. 

•  t-connect  is  used  for  connecting  a  client  to  a  server  in  a  connection  oriented 
network. 

•  t-listen  is  called  by  servers  waiting  for  requests  from  clients. 

•  t-accept  is  called  to  accept  connection  indicated  by  t-connect  function. 

•  t-snd  and  t-rcv  functions  are  used  to  send  and  recieve  data. 
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V.  AN  EXAMPLE 

In  this  chapter,  we  give  an  example  of  the  execution  of  this  protocol.  The 
example  starts  with  a  fixed  number  of  members  and  simulates  a  sequence  of  failures 
and  joins  to  the  membership.  We  then  analyze  the  group  view  at  all  member  sites  to 
see  if  they  are  identical  for  all  view  numbers. 

A.     INITIAL  CONDITIONS 

Assume  six  members  in  a  group  po,  pi,  p2,  P3,  P4,  and  p$  who  form  a  logical 
ring  po  —*  P\  —>  P2  ~*  Ps  —*  Pa  ^>  Ps-  All  members  have  this  structure  in  their 
groupview.  Assume  a  ViewNumber  of  6  at  all  places.  The  StatusTable  has  no 
entries  in  it  corresponding  to  any  member  in  the  group.  The  TokenPool  at  all  sites 
has  the  commitPQ{p^)  token  in  the  list.  All  members  recognize  member  po  as  the 
host  of  the  group.  In  this  example,  p\  is  the  clockwise  neighbour  of  po,  P2  is  the 
clockwise  neighbour  of  px  and  so  on.  The  monitoring  action  consists  of  the  member 
querying  its  counter-clockwise  member  and  getting  a  status  report  from  it.  Thus  po 
queries  p5  and  gets  a  status  report  from  it.  Likewise  (pi,  p0),  (p2,  Pi),  (^3,^2),  (^4,^3), 
(p5,p4)  form  querying  and  reporting  pair.  Fig. 3.1  gives  the  members  and  other  related 
information.  The  sequence  of  joins  and  failures  to  be  checked  for  are 

1.  Failure  of  member  p2, 

2.  Join  request  of  p&  arriving  at  p\  after  agreeement  phase  for  p2  is  over, 

3.  Failure  of  member  ps  , 

4.  Join  request  of  member  p-j  almost  immediately  sent  to  member  po,  and 
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5.  Failure  of  member  p0. 

The  next  section  describes  in  detail  the  sequence  of  actions  taking  place  at  all  member 
sites  for  this  sequence  of  events  and  gives  a  snapshot  of  GroupView  ,  StatusTable 
and  TokenPool  at  different  points  of  time. 

B.      EXPLANATION  OF  THE  EXAMPLE 

The  example  used  here  consists  of  three  parts.  The  first  part  deals  with  a  single 
failure  while  the  second  part  deals  with  a  single  join.  The  third  part  deals  with 
multiple  failures  and  joins. 

1.      Failure  of  a  Single  Member 

The  snapshot  of  parameters  at  all  members  is  given  in  Table  5.1.  The  failure 
of  p2  is  detected  by  pz  when  the  timer  in  pz  timeouts  before  it  can  receive  a  status 
report  from  p2.  When  this  happens  pz  shuts  off  communication  from  p2  and  starts 
the  agreement  phase  of  the  MP  to  agree  on  the  failure  of  p2.  The  Failure-Monitor 
process  at  p$  queries  its  local  groupview  manager  and  finds  that  the  new  member 
that  it  has  to  monitor  is  p\.  It  then  sends  a  status  query  to  p\.  It  also  sends  an 
agreement  token  agreeP3 (p2)toT  the  failure  of  p2  to  p^.  It  updates  the  TokenPool 
with  this  agreement  token  and  a  status  table  entry  for  p2  is  created  with  entry  as 
Failagree. 

When  pi  receives  the  status  query  from  pz  it  compares  the  sender  address 
with  its  previous  monitor  address  and  finds  it  to  be  different.  It  shuts  off  communi- 
cation to  p2  and  makes  pz  its  clockwise  neighbor.  p\  also  sends  its  Tokenpool  and 
status  report  to  pz-  The  TokenPool  consisits  of  only  the  commit  token  for  join  of  ^5. 
This  is  because  we  have  assumed  that  the  last  change  to  membership  view  was  the 
join  of  p5.  The  commit  tokens  are  garbage  collected  when  another  commit  token  is 
received  and  the  commit  tokens  ,occuring  before  the  agreement  token  for  the  same 
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TABLE  5.1:  SNAPSHOT  OF  INITIAL  CONDITION 


Event:  This  is  the  snap  shot  of  the  member  parameters  initially 

Member(s) 

GroupView, 
ViewNumber 

TokenPool 

StatusTable 

Action 

Po-  Pi,  P2,  P3, 

Pa,  Ps 

{pO,       Pl«      P2, 
P3,  P4,  P5},  6 

commitPo{ps) 

Nil 

member,  are  removed.  Since  join  of  p$  was  the  last  change,  commitPo(ps)  is  in  the 
TokenPool  till  there  is  a  new  change.  The  status  of  the  token  is  checked  from  the 
Tokenpool.  This  token  is  seen  to  be  a  duplicate  by  p3  and  it  takes  no  action  on  it. 

The  Agreement  token  agreeP3(p2)  goes  from  p3  to  p4  to  p5  to  pi  to  p3. 
The  snapshot  of  parameters  at  all  members  is  given  in  Table  5.2.  The  TokenPool 
at  all  these  member  sites  have  the  agreement  token  as  the  last  token  and  the  status 
table  entry  at  all  sites  at  this  point  have  the  entry  corresponding  to  p2  as  Failagree. 
When  the  token  reaches  p3  it  is  seen  as  a  duplicate  token.  p3  then  sees  that  it  is  the 
initiator  and  initiates  the  Commit  phase.  p3  sends  commit  token  to  its  TokenPool 
and  removes  the  entry  corresponding  to  p2  in  Status  Table.  It  also  removes  the 
token  commitPo(p5)  which  occurs  before  agreeP3(p2)1  as  per  the  garbage  collection 
protocol.  It  also  removes  agreeP3(p2)  from  TokenPool  and  commitP3(p2)  is  the  only 
token  in  the  TokenPool.  The  View  number  is  incremented  to  7  and  p2  is  removed 
from  groupview.  The  snapshot  of  parameters  at  all  members  is  given  in  Table  5.3.  It 
sends  the  commit  token  to  p4.  These  events  happen  at  all  the  member  sites  and  they 
update  their  GroupView  and  StatusTable  accordingly.  When  p3  gets  the  commit 
token  back  from  pi,  it  sees  it  to  be  a  duplicate  token  and  hence  does  not  take  any 
action,  thus  completing  the  commit  phase  at  all  member  sites.  The  snapshot  at  all 
member  sites  is  given  in  Table  5.4. 
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TABLE  5.2:  SNAPSHOT  AT  THE  END  OF  AGREEMENT  PHASE 


Event:  The  snapshot  at  the  end  of  agreement  phase 

Member(s) 

GroupView, 

ViewNumber 

TokenPool 

StatusTable 

Action 

Po,  Pi,  Pa, 

PS 

{po,     Pi,     P2, 

P3,  P4,  P5},  6 

commitPo(ps) 
agreeP3{p2) 

p2      Failagree 

P3 

same  as  above 

same  as  above 

same  as  above 

Initiates            the 
Commit  phase  of 
Pi 

TABLE  5.3:  SNAPSHOT  WHEN  ONLY  pz  HAS  COMMITTED 


Event:  The  snapshot  when  only  pz  only  has  committed 

Member(s) 

GroupView, 

ViewNumber 

TokenPool 

StatusTable 

Action 

P3 

Tpo<     Pi,     P3, 
P4,  Ps},  7 

commit  P3{p2) 

Nil 

p3      commits     to 
p2's  departure 

Po,  Pi,  P4, 

Ps 

{PO,       Pi,      P2, 
P3,   P4,  P5>,  6 

commitPo(ps) 
agreePi(p2) 

p2     Failagree 

TABLE  5.4:  SNAPSHOT  WHEN  ALL  MEMBERS  HAVE  COMMITTED 

P2 


Event:  This  is  the  snapshot  when  all  members  commit  p2 

Member(s) 

GroupView, 

ViewNumber 

TokenPool 

StatusTable 

Action 

PO,   Pi,   P3,   P4, 

Ps 

{PO,       Pl,      P3, 

P4,  Ps},  7 

commitP3{p2) 

Nil 
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2.  Join  of  a  Single  Member 

When  the  JoinRequest  of  a  potential  member  p6  arrives  at  pi,  the  member 
sees  that  it  is  not  the  host  and  creates  a  StatusTable  entry  for  p6  as  Joinreqst  and 
also  updates  its  Tokenpool  with  joinreqPl{ps).  It  sends  this  token  to  its  clockwise 
neighbor  p$.  This  token  goes  to  p4  and  p$  before  it  reaches  the  host  po-  The  snapshot 
at  this  point  is  given  in  Table  5.5.  When  the  host  gets  this  token  it  initiates  the  join 
agreement  phase  by  circulating  the  agreement  token  agreePo(p6).  This  token,  as  it 
traverses  round  the  ring,  is  added  to  the  TokenPool  of  all  the  members. 

When  the  host  receives  the  agreement  token  again  it  initiates  the  commit 
phase.  It  updates  its  GroupView,  Tokenpool  and  StatusTable  and  sends  commit  PQ{p&). 
It  then  sends  to  p6  the  current  contents  of  its  GroupView,  StatusTable  and  TokenPool 
and  makes  p&  its  counter-clockwise  neighbor.  p6,  on  receiving  the  initialisation  pa- 
rameters, commits  itself  to  the  group  view  .  The  snapshot  at  this  point  of  time  is 
given  in  Table  5.6.  pe  computes  p$  as  its  counter-clockwise  neighbor  and  sends  a 
status  query  to  it.  p5,  on  receiving  the  status  query  from  p&,  makes  it  the  clockwise 
neighbor  and  sends  the  TokenPool  to  it.  The  TokenPool  contains  agreePQ{p$)  which 
is  not  a  duplicate  token  in  p6.  This  token  is  ignored  by  p$  because  it  sees  that  it  is  the 
agreement  token  for  join  of  a  member  already  in  the  groupview.  The  commit  token 
goes  to  all  the  members  and  when  it  receives  a  commit  token,  the  JoinreqPl(po)  is 
removed  from  the  TokenPool.  The  snapshot  at  the  end  of  the  commit  phase  is  given 
in  Table  5.7. 

3.  Multiple  Failures  and  Joins 

In  this  subsection,  an  example  where  a  member  joins  the  group  and  the 
initiator  of  the  agreement  phase  fails  after  passing  the  token  is  given.  Another  member 
fails  at  the  same  time.  The  join  request  of  potential  member  pr  is  received  by  the  host 
Po  and  it  initiates  the  agreement  phase.  It  sends  the  agreement  token  agreePo(p?)  to 
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TABLE  5.5:  SNAPSHOT  BEFORE  INITIATING  AGREEMENT  FOR  A 
JOIN 


Event:  This  is  the  snapshot  before  agreement  phase  for  p6 

Member(s) 

Group  View, 

ViewNumber 

TokenPool 

StatusTable 

Action 

• 

Po 

{PO,      Pi,      P3, 
P4,  Ps},  7 

commit  P3(p2) 

Nil 

When  po  receives 
JoinreqPl(pe),    it 
initiates             the 
agreement    phase 
for  join 

Pi,  P3,  Pa,  Pb 

{PO,       Pl,      P3, 
P4,  P5>,  7 

commit  P3(p2) 
joinreqPl[pe) 

pe     Joinreqst 

Pi  and  fails.  p\  perceives  the  failure  of  the  host  and  initiates  the  agreement  phase 
for  its  failure.  It  updates  its  counter-clockwise  neighbor  and  sends  a  query  to  p6.  p8 
updates  p\  as  its  clockwise  neighbor  and  sends  its  report  and  TokenPool  to  p\.  p\ 
sends  agreePo(p7)  followed  by  agreePl(p0)  to  p2. 

If  po  had  failed  without  initiating  the  agreement  phase,  p7  would  have 
waited  for  a  time  period  and  would  have  sent  the  join  request  again.  By  that  time 
the  failure  of  po  would  be  committed,  p\  would  be  the  host,  and  would  initiate  the 
agreement  phase  for  the  joining  of  p?. 

At  this  point,  ps  is  perceived  to  have  failed  by  p$  and  p6  initiates  the 
agreement  phase.  The  token  is  passed  around  the  ring  and  it  reaches  p\  before 
agreement  tokens  agreePo(p7)  and  agreePl(p0)  circulate  back  to  p\.  When  agreepo(p?) 
reaches  pi,  it  finds  that  it  is  a  duplicate  token.  It  also  finds  that  other  agreement 
tokens  are  also  present  and  that  the  initiator  of  agreement  phase  has  failed  and, 
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TABLE  5.6:    SNAPSHOT  WHEN  p6  IS  COMMITTED  AT  THE  HOST 
Po 


Event:  This  is  the  snapshot  when  p&  is  committed  by  p0 

Member(s) 

GroupView, 
ViewNumber 

TokenPool 

StatusTable 

Action 

PO 

{po,     Pi,     P3, 
Pi,  Ps,  Pe},  8 

commitP0(p6) 

Nil 

P6 

same  as  above 

same  as  above 

same  as  above 

p%     commits     it- 
self   and     begins 
monitoring  p$  al- 
though  it   is   not 
yet  committed  by 
Ps. 

Pi,  P3,  Pa,  Ps 

{Po,     Pi,     P3, 
Pa,  Ps},  7 

commit  P3(p2) 

joinreqPl{p6) 
agreeP0{pe) 

p6     Joinagree 

TABLE  5.7:  SNAPSHOT  WHEN  ALL  MEMBERS  HAVE  COMMITTED 

P6 


Event:  This  is  the  snapshot  when  all  members  commit  pe 

Member(s) 

GroupView, 

ViewNumber 

TokenPool 

StatusTable 

Action 

Po,  Pi,  P3,  Pa, 
Ps,  Pe 

{PO,       Pi,       P3, 

Pa,  Ps,  Pe},  8 

commitPo(pe) 

Nil 
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therfore,  it  must  initiate  the  commit  phase.  It  then  computes  the  rank  of  all  the 
members  whose  agreement  token  is  pending  and  it  finds  that  there  are  agreement 
tokens  with  smaller  ranks.  For  example  the  rank  of  p0  is  0,  p5  has  a  rank  of  4,  and  p7 
has  a  rank  of  6.  So,  it  updates  the  status  of  pr  as  J oinpending .  It  is  assumed  that 
agree^ps)  is  at  p*  and  has  not  reached  p&.  Table  5.8  gives  the  snapshot  at  this  time. 
The  snapshot  gives  only  the  parameters  of  the  operational  members  in  the  current 
GroupView. 

Px  then  receives  the  fail  agreement  token  for  p0  ,  finds  its  rank  as  minimum, 
and  initiates  the  commit  for  po-  P6,  on  receiving  agreep6(p5),  finds  that  the  rank  of 
p5  is  not  minimum  and  updates  the  status  table  entry  for  p$  as  F ailpending .  When 
commitPl(po)  is  processed  at  p6,  it  inspects  other  agreement  tokens  and  finds  that  p5 
has  the  minimal  rank.  It  initiates  the  commit  phase  for  p$.  When  the  commit  for  p$  is 
processed  at  pi,  it  initiates  the  commit  for  join  of  p-j.  The  snapshot  is  given  in  Table 
5.9.  It  should  be  noted  that,  in  this  snapshot,  p3  has  not  yet  received  the  commit 
tokens  for  both  p5  and  p-?.  At  the  end  of  these  commit  actions  all  the  members  will 
have  identical  groupviews.  However,  the  views  committed  at  different  members  may 
be  different  at  a  particular  instant.  From  this  example,  it  is  seen  that  the  MP  is 
robust  for  multiple  failures  and  joins  occuring  at  almost  the  same  time. 
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TABLE  5.8:  SNAPSHOT  SHOWING  MULTIPLE  AGREE  TOKENS 


Event:  Snap  shot  to  show  pending  state  at  the  time  of  commit 

Member(s) 

Group  View, 

ViewNumber 

TokenPool 

StatusTable 

Action 

Pi 

{PO,      Pi,      P3, 

P4,  Ps,  Pe},  8 

commit  Po(p6) 

agreeP0{p7) 
agreePl{p0) 
agreeP6{ps) 

p7  JoinPending 

po  Failagree 
Ps  Failagree 

Initiates  commit 

for  failure  of  po 
and  suspends 
commit  Pl(p7). 

Ps,  Pa 

{po,     Pi,     P3, 
P4,  Ps,  Pe},  8 

commit  Po(p6) 

agreeP0{p7) 
agreepi{p0) 
agreep6(p5) 

p7  Joinagree 

po  Failagree 
ps  Failagree 

Pe 

{PO,      Pl,      P3, 

P4,  Ps,  Pe},  8 

commitpo(p6) 

agreeP6(ps) 
agreepo(p7) 
agreepi(p0) 

p5  Failagree 

p7  Joinagree 
Po  Failagree 

sets  status  of  p5  as 

failpending  and  takes 
no  further  action  on 
receiving  agreeP6(p5) 
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TABLE  5.9:  GROUPVIEW  FOR  SUCCESSIVE  VIEW  NUMBERS 


Event:  Snapshot  at  commit  of  p-j  at  p^ 

Member(s) 

GroupView, 

ViewNumber 

TokenPool 

StatusTable 

Action 

Pi 

{Pl,       £>3,       P4, 

Pe,  Pt},  11 

commitPl(po) 

commit  P6(p$) 
commitPl(p7) 

Nil 

commit Pl{pi) 
is  sent 

Pr 

same  as  above 

same  as  above 

same  as  above 

Ps 

{Pl,       P3,       P4, 

Ps,  Pe},  9 

commitPl(po) 

pr  Failagree 
ps  Failagree 

P4 

same  as  above 

same  as  above 

same  as  above 

P6 

{pi,      P3,      P4, 
Pe},  10 

«^eeP0(p7) 

commit  Pl{po) 
commitP6(ps) 

p7  Failagree 
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VI.  CONCLUSIONS  AND  FUTURE 
DIRECTIONS 

A.  CONCLUSIONS 

In  this  thesis,  a  decentralized  mechanism  for  providing  a  consistent  group  view 
has  been  presented.  This  approach  is  different  from  other  approaches,  most  of  which 
are  centralized  in  nature  for  providing  a  consistent  group  view.  The  proposed  ap- 
proach is  efficient  in  that  it  requires  only  2n  messages  for  committing  a  change  to  the 
membership  if  the  group  contains  n  members.  The  number  of  messages  is  the  same  for 
the  failure  of  any  member.  The  protocol  is  being  implemented  on  a  network  of  SUN 
workstations.  The  different  types  of  system  calls  for  implementing  the  protocol  have 
been  identified  and  tested.  The  programs  for  various  client-server  communication 
patterns  used  for  interfacing  various  functions  have  been  developed  and  tested. 

B.  FUTURE  WORK 

There  is  a  considerable  amount  of  work  that  can  be  done  as  a  continuation  of 
this  thesis.  The  coding  of  the  protocol  needs  to  be  completed  and  its  functioning 
observed.  Various  experiments  should  be  run  to  characterize  the  latency  of  commit- 
ting membership  changes  and  compared  with  the  centralized  protocol  implemented  in 
the  same  environment.  Ideally,  experiments  should  measure  the  performance  seen  for 
reliable  multicast  primitives.  Another  extension  to  this  thesis  is  to  propose  a  formal 
proof  for  the  correctness  of  the  protocol. 
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APPENDIX  A 

A.      GROUPVIEW  SERVER 

#include  <stdio.h> 

#mclude  "gvmesg.h" 

#include  "msgq.h" 

/* 

*  This  is  a  server  to  maintain  and  update  the  group  view.  It  receives 

*  requests  from  other  client  processes  and  acts  according  to  the 

*  clients  request. 
*/ 

GVMesg        group_view_mesg; 
typedef    struct  list_node  *MembPtr; 

typedef  struct  list_node  { 

char     memb_addr[9] ; 

MembPtr     next; 
}  MembListNode; 

struct  view  {/*  the  group  view  structure*/ 

int  view_number; 

int  group_size; 
}  group_view; 

MembPtr    tempptr,  headptr,  tailptr ,tempptrl; 


mainQ 
{ 


int    id,  prid; 
long    key; 
/* 

*  create  message  queue  if  required. 
*/ 

key  =1; 

prid=  getpidQ  ; 

if  (  (  id=  msgget(GVSER,(  PERMS  I  IPC.CREAT)))  <  0  )  { 

err_sys("server :  cant  get  message  queue  for  GroupView  server"); 
} 

/* 

*  do  it  eternally  as  an  iterative  server 
*/ 

while  (key  ==  1)    { 
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server(id) ; 
} 

exit(O) ; 
} 

server(id) 
int    id; 

{ 

int  loop_index,    loop_indexl,   loop_index2,   loop_mdex3,   grpsize; 

int  grpview_num,  num_bytes_read,  ml,  m2; 

long  proc_id_long; 

char  errmesg[256] ,  *sys_err_str() ; 

/* 

*  Read  the  message  from  the  message  queue 

*/ 

/*  type  for  client  to  server  messages  */ 
group_view_mesg.mesg_type  =  1L; 

if  (  (  num_bytes_read=  gview_mesg_recv(id,&group_view_mesg) )  <=  0) 
err_sys("server :  message  read  error"); 

/* 

*  Convert  the  process  id  embedded  as  a  long  integer  for  sending 

*  the  reply  back  to  the  same  client. 
*/ 

proc_id_long  =atol(group_view_mesg.proc_id) ; 

group_view_mesg.mesg_type  =  proc_id_long; 

/* 

*  check  for  the  message  header  part,  there  are  seven  possibilites 

*  if  the  header  says  "Uprmgview"  then  the  member  is  removed 

*  from  group  view.  If  the  header  says  "Upadgview"then  add  the 

*  member  to  the  end  of  the  group.  If  the  header  is  "initgview" 

*  the  initial  group  view  list  is  created  from  the  contents  of  the 

*  message.  If  the  header  is  "neibraddr"  the  anti-clockwise 

*  neighbor  address  is  sent  back  to  client.  If  header  is  "rankmembr" 

*  the  rank  of  the  members  specified  is  given.  If  the  header  is 

*  "hostaddrs"  the  address  of  the  host  is  given  to  the  client. 

*  If  the  header  is  "gpviewreq"  the  contents  of  the  group  view  list 

*  is  sent  as  a  message. 
*/ 

if   (strcmp(group_view_mesg.msg_header ,  "Uprmgview")  ==  0)   { 

/* 
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*  read  in  the  members  from  the  list  and  compare 

*  with  the  member  to  be  removed,  if  a  match 

*  is  found  then  remove  the  member  . 

*  change  the  link  address  so  that  the  link  is  not  broken 
*/ 

tempptr  =  headptr; 

/* 

*  extract  the  member  address  portion  from  the  list  and 

*  compare  with  the  member  to  be  removed.  If  there  is  no  match 

*  go  to  the  next  member  till  the  complete  list  is  covered. 
*/ 

for  (  loop_index  =  0;  loop_index  <  group_view.group_size; 
loop_index++)   { 

if  (strcmp(  tempptr->memb_addr, 
group_view_mesg.mesg_data[0] . member.address)  !=  0)  { 

tempptr 1  =  tempptr; 
tempptr  =  tempptr l->next; 
} 

/* 

*  If  the  match  is  for  the  host(  first  member)  remove  host 

*  and  update  the  pointer  to  the  list 
*/ 

else  if  (loop_index  ==  0)  { 

tempptr  =  headptr; 

headptr  =  tempptr->next ; 

f ree(tempptr) ; 

break; 
} 

else  { 

/*  If  a  match  is  found  in  the  middle  of  the  list  change 

*  the  link  to  reform  the  list. 
*/ 

tempptr l->next  =  tempptr->next ; 

free (tempptr) ; 

if  (tempptrl->next  ==  NULL) 

tailptr  =  tempptrl; 
break; 
} 
> 

/* 
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*  read  in  the  current  values  of  view  number 

*  and  number  of  members 

*  decrease  the  number  of  members  and  increase 

*  the  view  number. 
*/ 

group_view . view_number++ ; 
group. view. group. size-- ; 

}' 

else  if  (strcmp(group_view_mesg.msg_header ,  "Upadgview")  ==  0)   { 

/* 

*  This  segment  of  program  creates  storage  for  new  member 

*  it  updates  the  view  number  and  adds  the  address  of 

*  the  new  member  to  the  list. 
* 

*/ 

tempptr  =  (MembPtr)  malloc(  sizeof(  MembListNode) ) ; 

/* 

*  add  the  new  member  to  the  end  of 

*  membership  file.  The  first  member  is  always 

*  the  host  and  the  succesive  entries  in  the  file 

*  denote  the  ring  structure  of  group  membership 
*/ 

strcpy(tempptr->memb_addr , 

group_view_mesg.mesg_data[0] .member_address) ; 
tempptr->next  =  NULL; 
tailptr->next  =  tempptr; 
tailptr  =  tempptr; 

/* 

*  read  in  the  current  values  of  view  number 

*  and  number  of  members 

*  increment  the  view  number  and  the  number 

*  of  members 
*/ 


group_view. view_number++  ; 
group_view.group_size++  ; 

} 

else  if  (strcmp(group_view_mesg.msg_header,  "initgview")  ==  0)   { 
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/* 

*  this  part  of  the  program  initialises  the 

*  list  structure  and  generates  the  initial  group  view. 
*/ 

group_view. view_number  =  atoi(group_view_mesg. view_num) ; 
group_view.group_size  =  atoi(group_view_mesg.num_mem) ; 

/* 

*  create  the  list  structure,  make  the 

*  pointer  to  the  first  element  the  header 
*/ 

for  (loop_indexl  =  0;  loop_mdexl  <  group_view.group_size; 
loop_indexl++)   { 


tempptr  =  (MembPtr)  malloc(  sizeof(  MembListNode) ) ; 
if  (loop_indexl  ==0)  { 

headptr  =  tempptr;  /*  head  of  the  list  */ 
tailptr  =  headptr;  /*  initial  tail  */ 
} 

/* 

*  copy  the  number  of  members  iteratively. 
*/ 

strcpy(tempptr->memb_addr, 

group_view_mesg.mesg_data[loop_indexl] .member_address)  ; 

tempptr->next  =  NULL; 

tailptr->next  =  tempptr; 

tailptr  =  tempptr; 
} 
} 
else  if  (strcmp(group_view_mesg.msg_header  ,  "gpviewreq,,)==  0  ){ 

grpview_num  =  group_view. view_number ; 
grpsize  =  group_view.group_size; 

/* 

*  integer  to  ascii  conversion  routine  for 

*  view  number,  the  string  is  a  null  terminated 
*/ 

itoa(grpview_num,  group_view_mesg. view_num,  5); 

/* 

*  integer  to  ascii  conversion  routine  for  number 

*  of  members  .the  string  is  a  null  terminated 
*/ 
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itoa(grpview_num,  group_view_mesg.num_mem,  3); 
printf  ("group  size  is  °/,d\n"  ,  grpsize)  ; 
tempptr  =  headptr; 

/* 

*  read  the  list  completely  and  create  the  message  to  be 

*  sent  back.  The  process  index  is  used  for  sending  the 

*  message  back  to  the  client. 
*/ 

for  (loop_index2  =  0;  loop_index2  <  grpsize; 
loop_index2++)  { 

strcpy ( 

group_view_mesg.mesg_data[loop_index2] .member_address  , 

tempptr->memb_addr) ; 

group_view_mesg.mesg_data[loop_index2]  . 

member_address[8]  =  NULL; 

tempptr 1  =  tempptr; 

tempptr  =  tempptrl->next ; 
} 

group_view_mesg.mesg_type  =  atol(group_view_mesg.proc_id) ; 

group_view_mesg.mesg_len  =  grpsize*12  +  22; 

printf  ("message  length  =,/,d\n"  ,group_view_mesg.mesg_len)  ; 

gview_mesg  send(id,  &group_view_mesg) ; 
} 
else  if  (strcmp(group_view_mesg.msg_header ,  "hostaddrs")  ==  0)  { 

/* 

*  send  the  address  pointed  by  the  headptr 

*  as  the  host  of  the  group. 
*/ 

tempptr  =  headptr; 

strcpy (group_view_mesg . mesg_dat a [0] . 

member_address ,  headptr->memb_addr) ; 

group_view_mesg.mesg_type  =  atol(group_view_mesg.proc_id) ; 

group_view_mesg.mesg_len  =12+22; 

gview_mesg_send(id,  &group_view_mesg) ; 


else  if  (strcmp(group_view_mesg.msg_header ,  "neibraddr")  ==  0)  { 

/* 

*  read  in  the  members  from  the  list  and  compare 

*  with  the  member  address  given,  if  a  match 

*  is  found  then  identify  the  member  previous  to 

*  the  match  as  anticlockwise  neighbor 
*/ 
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tempptr  =  headptr; 

/* 

*  extract  the  member  address  portion  from  the  list  and 

*  compare  with  the  member  address  given.  If  there  is  no 

*  match  continue  till  the  complete  list  is  covered. 
*/ 

for  (  loop_index  =  0;  loop_index  <  group_view.group_size; 
loop_index++)  { 

if  (strcmp(  tempptr->memb_addr , 
group_view_mesg.mesg_data[0] .member_address)  !=  0)  { 

tempptr 1  =  tempptr; 
tempptr  =  tempptrl->next ; 
} 

/* 

*  If  the  match  is  for  the  host(  first  member) 

*  tailptr(  last  member) is  the  anticlockwise  neighbor 
*/ 

else  if  (loop_index  ==  0)  { 

strcpy (group_view_mesg.mesg_data[0] . 

member_address,  tailptr->memb_addr) ; 

break; 
} 

else  { 

/*  If  a  match  is  found  in  the  middle  of  the  list 

*  anticlockwiise  neighbor  is  the  previous  member. 
*/ 

strcpy (group_view_mesg .mesg_data[0] . 
member_address ,  tempptrl->memb_addr) ; 

break; 
} 
} 

/* 

*  Send  the  message  back  to  the  client,  fill  the 

*  message  structure  with  all  relavant  details. 
*/ 

group_view_mesg.mesg_type  =  atol(group_view_mesg.proc_id) ; 
group_view_mesg.mesg_len  =  12+22; 
gview_mesg_send(id,  &group_view_mesg) ; 
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} 

else  if   (strcmp(group_view_mesg.msg_header ,  "rankmembr")  ==  0)   { 

/* 

*  read  in  the  members  from  the  list  and  compare 

*  with  the  member  address  given,  if  a  match 

*  is  found  then  identify  the   distance  from 

*  head  pointer  as  the  rank. 
*/ 

for  (ml  =  0;  ml  <  atoi(group_view_mesg.num_mem) ;  ml++){ 
tempptr  =  headptr; 

/* 

*  extract  the  member  address  portion  from  the  list  and 

*  compare  with  the  member  address  given.  If  there  is 

*  no  match  continue  till  the  complete  list  is  covered. 
*/ 

for  (  loop_index  =  0;  loop_index<group_view.group_size; 
loop_index++)   { 

m2  =  loop_mdex; 

if    (strcmp(  tempptr->memb_addr , 

group   view_mesg.mesg_data[ml] .member_address) 

!=   0)    { 

tempptrl  =  tempptr; 
tempptr  =  tempptrl->next ; 
} 


else  { 

/*   If   a  match  is  found  the  loop   index  gives   its   rank. 
*/ 

group_view_mesg.mesg_data[ml]  . 
member_rank[l]   =    ((m2'/.10)   +48); 
group_view_mesg.mesg_data[ml]  . 
member_rank[0]   =    (((m27.100)-  m2°/.10)/10  +48); 

break; 
} 
} 
} 
/* 

*  Send  the  message  back  to  the  client,  fill  the 

*  message  structure  with  all  relavant  details. 
*/ 

group_view_mesg.mesg_type  =  atol(group_view_mesg.proc_id) ; 

group_view_mesg.mesg_len  = 

12  *  atoi(group_view_mesg.num_mem)  +  22; 
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gview_mesg_send(id,  &group_view_mesg) ; 
} 

/* 

*  this  segment  checks  for  the  list  as  it  prints  it  out. 

*/ 

tempptr  =  headptr; 

for  (loop_index3  =  0;  loop_index3  <  group_view.group_size; 
loop_index3++)   { 

printf  ("check  for  string  °/,s  member  rank  is  %s\n" , 

t  emppt  r- >memb_  addr , 

group_view_mesg.mesg_data[loop_index3] .member_rank) ; 

tempptr 1  =  tempptr->next ; 
tempptr  =  tempptr 1; 
} 


} 

B.  TOKENPOOL  SERVER 

#include   <stdio.h> 
#include   "tkpmesg.h" 
#include   "msgq.h" 


/*  This  program  maintains  and  manages  the  tokenpool. 
*  it  receives  messages  from  clients  and  acts  on  them  accordingly, 
*/ 


TKPMesg      token_pool_mesg,mesgl; 
typedef   struct  list.node  *MembPtr; 

typedef  struct  list_node  {       /*  Token  pool  structure  */ 

char    initiator_addr [9] ; 

char    memb_addr[9] ; 

char   token_type[10] ;  /*  token  type  of  member   */ 

MembPtr    next; 
}  TokenPoolNode; 

struct  view  {/*  the  Status  table  entries*/ 

int  number_of .entries ; 
}  token_view; 

MembPtr   tempptr,  headptr,  tailptr ,tempptrl ; 
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mainO 
{ 


} 


int   id,  prid; 
long   key; 
/* 

*  create  message  queue  if  required. 

*/ 

key  =1; 

prid=  getpidQ  ; 

if    (    (    id=  msgget(TKPSER,(  PERMS    I    IPC.CREAT)))    <   0   )    { 

err_sys("server :    cant  get  message  queue  for  tokenpool  server"); 
} 

/* 

*  do  it  eternally  as  an  iterative  server 
*/ 

while  (key  ==  1)    { 

tokenpool  server(id); 
} 

exit(O) ; 


tokenpool_server(id) 

int   id; 

{ 

int   loop_index,  loop.indexl,  loop_index2,  loop_index3,  grpsize; 

int   grpview_num,  num_bytes_read  ,  token_found,  number.search; 

long   proc_id_long; 

char   errmesg[256] ,  *sys_err_str() ; 

/* 

*  Read  the  message  from  the  message  queue 
*/ 

/*  type  for  client  to  server  messages  */ 

token_pool_mesg.mesg_type  =  1L; 

if  (  (  num_bytes_read=  tkp_mesg_recv(id,&token_pool_mesg) )  <=  0) 

err_sys ("server :  message  read  error"); 
proc_id_long  =atol(token_pool_mesg.proc_id) ; 

token_pool_mesg.mesg_type  =  proc_id_long; 

/* 

*  check  for  the  message  header  part,  there  are  five  possibilites 

*  If  the  header  says  "Tokenstat"  then  status  of  token  as  old 
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*  or  new  is  given,  if  it  is  "Add_token"  the  token  is 

*  added  in  the  end  of  group. 

*  if  it  is  "uptknpool"  then  the  list  is  traversed  till  agreement 

*  token  for  that  particular  member  is  found.  If  found  all  tokens 

*  before  and  inclusive  of  the  agreement  token  are  purged  and 

*  the  commit  token  is  added  at  the  end  of  token  pool. 

*  If  the  header  says  "tkpoolreq"  then  the  current  token 

*  pool  is  sent  as  a  message.  If  the  header  says  "mitkpool" 

*  then  a  list  is  created  with  the  message  supplied. 
*/ 


if  (strcmp(token_pool_mesg.msg_header ,  "Tokenstat")  ==  0)   { 

/* 

*  This  segment  checks  the  list  to  find  out  if  the  token 

*  is  present  or  not. 
*/ 

tempptr  =  headptr; 

for  (  loop_index  =  0;  loop_index  <  token_view.number_of .entries ; 

loop_index++)  { 

if  ((strcmp(  tempptr->memb_addr, 

token_pool_mesg.mesg_data[0] .member_address)  ==  0)  & 
(strcmp(  tempptr->token_type, 
token_pool_mesg.mesg_data[0] .token_type)  ==  0)  )  { 

token_f ound  =  1 ; 

break; 

} 

else  { 

tempptr  =  tempptr->next ; 
} 
} 

if  (token_found  ==  0)  { 

strcpy(token_pool_mesg.msg_header,  "notpresnt") ; 
}  else  { 

strcpy(token  pool  mesg.msg_header,  "yespresnt") ; 
} 

token_pool_mesg.mesg_len  =  28  +  17; 
tkp_mesg_send(id,  &token_pool_mesg) ; 

} 

else  if  (strcmp(token_pool_mesg.msg_header,  "add_token")  ==  0)   { 

/*  this  segment  adds  token  to  the  end  of  the  list  after 
*  getting  additional  allocation. 
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*/ 

tempptr  =  (MembPtr)  malloc(  sizeof(  TokenPoolNode)) ; 
tempptr->next  =  NULL; 
tailptr->next  =  tempptr; 
tailptr  =  tempptr; 
strcpy(tempptr->token_type, 
token_pool_mesg.mesg_data[0] .token_type) ; 
strcpy(tempptr->memb_addr , 

token_pool_mesg.mesg_data[0] .member_address) ; 
strcpy(tempptr->initiator_addr , 

token_pool_mesg.mesg_data[0] . initiator_address) ; 
token_ view. number _of _entries++; 


else  if  (strcmp(token_pool_mesg.msg_header,  "uptknpool")  ==  0)   { 

/* 

*  This  segment  updates  the  token  pool  with  the  commit  token. 

*  the  list  is  purged  to  remove  tokens  upto  and  inclusive 

*  of  agree  token  of  the  member  address  . 
*/ 

tempptr  =  headptr; 
number.search  =  0; 

for  (loop_index  =  0;  loop_mdex  <  token_view.number_of .entries; 
loop_index++)   { 

/*  searh  the  list  till  agreement  token  for  the  commit  action 
*  is  found. 
*/ 

number_search++; 

if  ( (strcmp (  tempptr->memb_addr, 

token_pool_mesg.mesg_data[0] . member.address)   ==  0)    & 

(strcmp(token_pool_mesg.mesg_data[0] .token. type, 

"joincomit")   ==  0)   & 

(strcmp (  tempptr->token_type, 

"joinagree")  ==  0)  )  { 

break; 
} 

else  if  ( (strcmp (  tempptr->memb_addr, 

token_pool_mesg.mesg_data[0] .member_address)   ==  0)   k 

(strcmp (token_pool_mesg . mesg_dat a [0] . token_type , 

"failcomit")   ==  0)   & 

(strcmp (  tempptr->token_type, 

"failagree")  ==  0)  )  { 
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break; 
} 

else  { 

tempptr  =  tempptr->next ; 
} 
} 

/*  remove  all  commit  tokens  occuring  before  the  commit  token  */ 

tempptr  =  headptr; 

for  (loop_indexl  =  0;  loop.indexl  <  number_search; 

loop_mdexl++)  { 

if  (!((strcmp(  tempptr->token_type, 

"joincomit")  ==  0)  I 

(strcmp(tempptr->token_type, 

"failcomit")  ==  0)  I 

((strcmp(  tempptr->memb_addr, 

token_pool_mesg.mesg_data[0] .member_address)   ==  0)   & 

( ( (strcmp (t oken_pool_mesg .mesg_data [0] . token_type , 

"failcomit")   ==  0)    & 

(strcmp(  tempptr->token_type, 

"failagree")   ==  0)    )    I 

((strcmp(token_pool_mesg.mesg_data[0] .token_type, 

"joincomit")   ==  0)   & 

(strcmp (  tempptr->token_type, 

"jomagree")  ==  0)  )  )  )))  { 

tempptr 1  =  tempptr; 
tempptr  =  tempptrl->next ; 
> 

/* 

*  If  the  match  is  for  the  first  member  removal 

*  update  the  pointer  to  the  head  of  the  list. 
*/ 

else  if  (loop.indexl  ==  0)  { 

tempptr  =  headptr; 

headptr  =  tempptr->next ; 

token_view . number_of .entries — ; 

f ree(tempptr) ; 
} 

else  { 

/*  If  a  match  is  found  in  the  middle  of  the  list  change 

*  the  link  to  reform  the  list. 
*/ 
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tempptrl->next  =  tempptr->next ; 
token_view . number_of .entries-- ; 
f ree(tempptr) ; 
if  (tempptrl->next  ==  NULL) 
tailptr  =  tempptrl; 


} 


/* 

*  add  commit  token  to  the  end  of  the  list. 
'  */ 

tempptr  =  (MembPtr)  malloc(  sizeof(  TokenPoolNode)) ; 

tempptr->next  =  NULL; 

tailptr->next  =  tempptr; 

tailptr  =  tempptr; 

st rcpy (t emppt r->t oken_type , 

token_pool_mesg.mesg_data[0] .token_type) ; 

s t  r cpy ( t  emppt  r- >memb_  addr , 

token_pool_mesg.mesg_data[0] .member_address) ; 

strcpy  (t  emppt  r->mitiator_addr, 

token_pool_mesg.mesg_data[0] .initiator_address) ; 

token_view . number_of _entries++ ; 


else  if    (strcmp(token_pool_mesg.msg_header,    "initkpool")   ==  0)      { 


/* 

*  this  part  of  the  program  initialises  the 

*  list  structure  and  generates  the  initial  token  pool. 
*/ 

token_view.number_of .entries  =  atoi(token_pool_mesg.num_mem) ; 

/* 

*  create  the  list  structure,  make  the 

*  pointer  to  the  first  element  the  header 
*/ 

for  (loop_indexl  =  0;  loop_mdexl  <  token_view.number_of .entries; 
loop_indexl++)  { 


tempptr  =  (MembPtr)  malloc(  sizeof(  TokenPoolNode)); 
if  (loop_indexl  ==0)  { 

headptr  =  tempptr;  /*  head  of  the  list  */ 
tailptr  =  headptr;  /*  initial  tail  */ 
} 
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/* 

*  copy  the  number  of  members  iteratively. 
*/ 

strcpy (tempptr->memb_addr , 

token_pool_mesg.mesg_data[loop_indexl] .member_address) ; 

strcpy (tempptr->initiator_addr, 

token_pool_mesg.mesg_data[loop_mdexl] . initiator.address) ; 

st rcpy(tempptr-> token. type, 

token_pool_mesg.mesg_data[loop_indexl] .token_type) ; 

tempptr->next  =  NULL; 

tailptr->next  =  tempptr; 

tailptr  =  tempptr; 
} 
} 
else  if  (strcmp(token_pool_mesg.msg_header  , MTkpoolrequ)==  0  ){ 

grpsize  =  token_view.number_of .entries ; 


/* 

*  integer  to  ascii  conversion  routine  for  number 

*  of  members  .the  string  is  a  null  terminated 
*/ 

itoa(token_pool_mesg.num_mem,  grpsize,  3); 

tempptr  =  headptr; 

for  (loop_index2  =  0;  loop_index2  <  grpsize;  loop_index2++)  { 

strcpy( 

token_pool_mesg.mesg_data[loop_index2] .member_address  , 

tempptr->memb_addr) ; 

strcpy( 

token_pool_mesg.mesg_data[loop_index2] . initiator_address   , 

tempptr->initiator_addr) ; 

strcpy (token_pool_mesg.mesg_data[loop_index2] .token_type   , 

tempptr->token_type) ; 

token_pool_mesg.mesg_data[loop_index2] .member_address[8]   = 

NULL; 

tempptrl  =  tempptr; 

tempptr  =  tempptrl->next ; 


token_pool_mesg.mesg_type  =  atol(token_pool_mesg.proc_id) ; 
token_pool_mesg.mesg_len  =  grpsize*28  +  17; 
tkp_mesg_send(id,  &token_pool_mesg) ; 
} 


/* 

*  This  segment  of  program  checks  the  list  after  each  change 
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*/ 

tempptr  =  headptr; 

for  (loop_index3  =  0 ; loop_index3  < 

token_view . number_of .entries ; loop_index3++)   { 

printf  ("check  for  string  7,s\n"  ,  tempptr->memb_addr)  ; 
printf  ("check  for  string  '/„s\n"  ,  tempptr->token_type)  ; 
tempptrl  =  tempptr->next ; 
tempptr  =  tempptrl; 


C.  STATUS  TABLE  SERVER 

#include   <stdio.h> 
#include   "stmesg.h" 
#mclude   "msgq.h" 

/*  This  program  maintains  and  manages  the  status  table. 
*  it  receives  messages  from  clients  and  acts  on  them  accordingly 
*/ 

STMesg      status_table_mesg,mesgl ; 

typedef   struct  list_node  *MembPtr; 

typedef  struct  list_node  {       /*  Status  table  structure  */ 

char    memb_addr[9] ; 

char   member_status  [10] ;  /*  status  of  member   */ 

MembPtr    next; 
}  StatusTableNode; 

struct  view  {/*  the  Status  table  entries*/ 

int  number_of .entries; 
}  status_view; 

MembPtr   tempptr,  headptr,  tailptr, tempptrl ; 

main() 
{ 

int   id,  prid; 
long   key; 
/* 
*  create  message  queue  for  status  table  server  if  required. 
*/ 
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} 


key  =1; 

prid=  getpidQ  ; 

if  (  (  id=  msgget(STSER,(  PERMS  I  IPC.CREAT)))  <  0  )  { 
err_sys("server :  cant  get  message  queue  for  status  table  server"); 
}" 
/* 
*  do  it  eternally  as  an  iterative  server 
*/ 
while  (key  ==  1)    { 

status_table_server(id) ; 
} 

exit(O)  ; 


status_table_server(id) 

int   id; 

{ 

int   loop_index,  loop_indexl,  loop_index2,  loop_index3,  grpsize; 

int   grpview_num,  num_bytes_read  ,  address_found,  ml; 

long   proc_id_long; 

char   errmesg[256] ,  *sys_err_str() ; 

/* 

*  Read  the  message  from  the  message  queue 
*/ 

/*  type  for  client  to  server  messages  */ 
status_table_mesg.mesg_type  =  1L; 

if  (  (  num_bytes_read=  st_mesg_recv(id,&status_table_mesg) )  <=  0) 

err_sys("server :  message  read  error"); 
proc_id_long  =atol(status_table_mesg.proc_id) ; 

status_table_mesg.mesg_type  =  proc_id_long; 


/* 

*  check  for  the  message  header  part,  there  are  seven  possibilites 

*  if  the  header  says  "Statmembr"  then  the  member's  status 

*  is  sent  if  available.  If  the  header  says  "updtstats"  then 

*  the  member  is  added  in  the  end  of  group  with  the  updated  status. 

*  If  the  header  says  "statblreq"  then  the  current  status 

*  table  is  sent  as  a  message.  If  the  header  says  "inittable" 

*  then  a  list  is  created  with  the  message  supplied. 

*  If  the  header  says  "removmemb"  then  the  entry  for  the  member 

*  is  removed  from  status  table.  If  the  header  is  "checkpend" 

*  the  status  table  sends  the  member  address  whose  status  is  either 

*  failpending  or  joinpending.  For  header  "getmemadr"  the  server 

*  returns  all  member  address  having  the  same  status  given  in  the 
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*  message 
*/ 


if      (strcmp(status_table_mesg.msg_header,    "checkpend")    ==   0)      { 
tempptr  =  headptr; 
status_table_mesg.num_mem[0]    =  48; 
status_table_mesg.num_mem[l]    =  48; 

for   (loop_mdex  =   0; 

loop_index  <   status_view.number_of .entries;    loop_index++)      { 

/* 

*  check  if  any  member  in  the  status  table  has  a  pending 

*  status.  If  present  send  the  address.  num_mem  field  specifies 

*  the  presence  or  absence  of  the  member  having  pending  status. 
*/ 

if  ((strcmp(  tempptr->member_status,"joinpendg")  ==  0)  I 
(strcmp(  tempptr->member_status , "f ailpendg")  ==0))  { 

strcpy (status_table_mesg .mesg_data[0] . member.status , 
tempptr->member_status) ; 

strcpy(status_table_mesg.mesg_data[0] .member. address, 
tempptr->memb_addr) ; 

status_table_mesg.num_mem[0]  =49;  /*  ascii  1  */ 
break; 
} 
else  { 

tempptr  =  tempptr->next ; 
} 
status_table_mesg.mesg_len  =19+17; 

status_table_mesg.mesg_type  =  atol(status_table_mesg.proc_id) ; 
st_mesg_send(id,  &status_table_mesg) ; 
} 
} 

if   (strcmp(status_table_mesg.msg_header,  "statmembr")  ==  0)   { 

/* 

*  This  segment  of  program  gets  the  current  status  of 

*  the  member. If  the  member  address  is  not  present 

*  then  it  signals  in  the  status  field  as  not  present. 
*/ 


/* 

*  read  in  the  member  address  from  message  and  compare 

*  with  the  member  address  in  the  list  .  If  a  match 

*  is  found  then  send  a  message  giving  the  current  status 
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*/ 

tempptr  =  headptr; 
address_found  =  0; 

/* 

*  search  the  table  for  the  entry  corresponding  to  the  member 

*  address  given.  If  the  address  is  found  send  the  status 

*  as  a  message. 
*/ 

for  (  loop_index  =  0; 

loop_index  <   status_view.number_of .entries;    loop_mdex++)      { 

if    (strcmp(  tempptr->memb_addr , 
status_table_mesg.mesg_data[0] .member_address)   ==  0)    { 

st r cpy( stat us _table_me sg.mesg.dat a [0] .member .status, 

tempptr->member_status) ; 

address.found  =   1; 

break; 

} 

else  { 

tempptr  =  tempptr->next ; 
} 
} 

/* 

*  if  address  is  not  found  send  the  message  as  "not  Found". 
*/ 

if  (address_f ound  ==  0) 

strcpy(status_table_mesg.mesg_data[0] .member. status, 
"notpresnt") ; 


/* 

*  send  the  message  to  the  client,  fill  the  message  structure 

*  with  appropriate  data. 
*/ 

status_table_mesg.mesg_len  =19+17; 
status_table_mesg.mesg_type  =  atol(status_table_mesg.proc_id) ; 

st  mesg_send(id,  &status  table  mesg) ; 
> 

else  if  (strcmp(status_table_mesg.msg_header ,  "Updtstats")  ==  0)  { 

/* 
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*  This  segment  updates  the  status  of  the  member 

*  if  the  member  is  already  present  it  updates  the  status 

*  otherwise  it  adds  the  member  and  status  in  the  tail  of 

*  list. 
*/ 

tempptr  =  headptr; 

address.found  =  0; 

for  (  loop.index  =  0; 

loop_index  <  status_view.number_of .entries;  loop_index++)   { 

if  (strcmp(  tempptr->memb_addr , 
status_table_mesg.mesg_data[0] .member_address)  ==  0)  { 

strcpy (tempptr->member_status , 
status_table_mesg.mesg_data[0] .member_status) ; 
address_found  =  1; 
break; 

} 

else  { 

tempptr  =  tempptr->next ; 
} 
} 

/* 

*  if  member  is  not  already  present  add  the  member  with 

*  status  specified. 
*/ 

if  (address.f ound  ==  0)  { 

tempptr  =  (MembPtr)  malloc(  sizeof(  StatusTableNode) ) ; 

tempptr->next  =  NULL; 

tailptr->next  =  tempptr; 

tailptr  =  tempptr; 

strcpy (tempptr->memb_addr, 

status_table_mesg.mesg_data[0] .member.address) ; 

strcpy (tempptr->member_status, 

status_table_mesg.mesg_data[0] .member_status) ; 

s tat us _ view. number _of _entries++; 


} 

else  if  (strcmp(status_table_mesg.msg_header ,  "inittable")  ==  0)   { 


/* 

*  this  part  of  the  program  initialises  the 

*  list  structure  and  generates  the  initial  status  table. 
*/ 

status_view.number_of .entries  =  atoi(status_table_mesg.num_mem) ; 
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/* 

*  create  the  list  structure,  make  the 

*  pointer  to  the  first  element  the  header 
*/ 

for  (loop_indexl  =  0;  loop_mdexl  <  status_view.number_of .entries ; 
loop_indexl++)   { 


tempptr  =  (MembPtr)  malloc(  sizeof(  StatusTableNode) ) ; 
if  (loop_indexl  ==0)  { 

headptr  =  tempptr;  /*  head  of  the  list  */ 
tailptr  =  headptr;  /*  initial  tail  */ 
} 

/* 

*  copy  the  number  of  members  iteratively. 
*/ 

strcpy (tempptr->memb_addr , 

status_table_mesg.mesg_data[loop_indexl] .member. address) ; 

strcpy (tempptr->member_status , 

status_table_mesg.mesg_data[loop_indexl] .member_status) ; 

tempptr->next  =  NULL; 

tailptr->next  =  tempptr; 

tailptr  =  tempptr; 
> 
} 
else  if (strcmp(status_table_mesg.msg_header  , "getmemadr")"  0  ){ 

grpsize  =  status_view.number_of .entries; 


/*  create  the  message  from  the  list.  Fill  all  the 
*  other  data  required  for  the  message  to  be  sent. 
*/ 

tempptr  =  headptr; 
ml  =  0; 

/* 

*  check  the  list  to  see  if  there  is  a  match  with  the 

*  status  given  in  the  message  to  the  status  of  members 

*  in  the  list.  If  a  match  is  found  add  them  to  the  meesage 

*  and  increment  the  number  of  items  in  the  message. 
*/ 

for  (loop_index2  =  0;  loop_mdex2  <  grpsize;  loop_index2++)  { 
if  (strcmp(  tempptr->member_status,"f ailagree")  ==  0){ 

strcpy (status_table_mesg.mesg_data[ml] .member_address  , 
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} 


tempptr->memb_addr) ; 

strcpy(status_table_mesg.mesg_data[ml] .member.status    , 

tempptr->member_status) ; 

ml++; 

tempptr  =  tempptr->next ; 

} 

else     { 

tempptr  =  tempptr->next ; 
} 


status_table_mesg.num_mem[l]    =  ml'/.lO  +48; 
status_table_mesg.num_mem[0]    =    (ml*/,100-ml'/.10)/10  +48; 
status_table_mesg.num_mem[2]    =  NULL; 

status_table_mesg.mesg_type  =  atol(status_table_mesg.proc_id) ; 
status_table_mesg.mesg_len  =  ml*19  +   17; 
st_mesg_send(id,   &status_table_mesg) ; 


else   if (strcmp(status_table_mesg.msg_header   , "statblreq")==  0   ){ 
grpsize  =   status_view.number_of .entries; 


/* 

*  integer  to  ascii  conversion  routine  for  number 

*  of  members    .the  string  is  a  null  terminated 
*/ 

status_table_mesg.num_mem[l]   =  grpsize'/.lO  +48; 

status_table_mesg.num_mem[0]    =    (grpsize'/.lOO-grpsize'/.lCO/lO  +48; 
status_table_mesg.num_mem[2]   =  NULL; 

/*  create  the  message  from  the  list.   Fill  all  the 

*  other  data  required  for  the  message  to  be  sent. 
*/ 

tempptr  =  headptr; 

for   (loop_index2  =  0;    loop_mdex2  <  grpsize; 

loop_index2++)  { 

strcpy( 

status_table_mesg.mesg_data[loop_index2] .member_address    , 

tempptr->memb_addr) ; 

strcpy( 

status_table_mesg.mesg_data[loop_index2] .member_status    , 

tempptr->member_status) ; 

status_table_mesg.mesg_data[loop_index2] . member_address [8]    = 

NULL; 

printf("  running  in  loop"); 

tempptrl  =  tempptr; 
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tempptr  =  tempptrl->next ; 
} 

status_table_mesg.mesg_type  =   atol(status_table_mesg.proc_id) ; 
status_table_mesg .mesg_len  =  grpsize*19  +   17; 
printf  ("message  length  ='/,d\n"  ,status_table_mesg.mesg_len)  ; 
st_mesg_send(id,   &status_table_mesg) ; 


else   if    (strcmp(status_table_mesg.msg_header ,    "removmemb")   ==   0)    { 

/* 

*  read  in  the  members  from  the  list  and  compare 

*  with  the  member  to  be  removed,  if  a  match 

*  is  found  then  remove  the  member  . 

*  change  the  link  address  so  that  the  link  is  not  broken 
*/ 

tempptr  =  headptr; 

/* 

*  extract  the  member  address  portion  from  the  list  and 

*  compare  with  the  member  to  be  removed.  If  there  is  no  match 

*  go  to  the  next  member  till  the  complete  list  is  covered. 
*/ 

for  (loop.index  =  0;  loop_index  <  status_view.number_of .entries; 
loop_index++)   { 

if  (strcmp(  tempptr->memb_addr, 
status_table_mesg.mesg_data[0] .member_address)  !=  0)  { 

tempptr 1  =  tempptr; 
tempptr  =  tempptrl->next ; 
} 

/* 

*  If  the  match  is  for  the  host(  first  member)  remove  host 

*  and  update  the  pointer  to  the  list 
*/ 

else  if  (loop_index  ==  0)  { 

tempptr  =  headptr; 

headptr  =  tempptr->next ; 

f ree(tempptr) ; 

break; 
> 

else  { 
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/*  If  a  match  is  found  in  the  middle  of  the  list  change 
*  the  link  to  reform  the  list. 
*/ 

tempptrl->next  =  tempptr->next ; 

free(tempptr) ; 

if  (tempptrl->next  ==  NULL) 

tailptr  =  tempptrl; 
break; 
.    > 


} 

tempptr  =  headptr; 

for  (loop_index3  =  0;  loop_index3  <  status_view.number_of .entries ; 
loop_mdex3++)  { 

printf  ("check  for  string  '/,s\n" ,  tempptr->memb_addr)  ; 
printf  ("check  for  string  °/0s\n"  ,  tempptr->member_status)  ; 
tempptrl  =  tempptr->next ; 
tempptr  =  tempptrl; 
} 


D.      COMMIT  PROCESS  SERVER 

#define  TRUE  1 
#define  FALSE  0 
#include  <stdio.h> 
#include    "commesg.h" 
#include    "tkpmesg.h" 
#include    "gvmesg.h" 
#include    "stmesg.h" 
#include    "injoinmesg.h" 
#include    "msgq.h" 

TKPMesg    tkpmesg; 
GVMesg    gvmesg; 
STMesg    stmesg; 
INJOINMesg    injoin_mesg; 
COMMesg  commit _mesg; 
main() 
{ 

int    comid,  key; 


/* 
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} 


*  create  and  open  the  single  message  queue  for  commit  processing. 
*/ 

if  (  (  comid=  msgget(COMMITQ, (PERMS  I  IPC.CREAT  )))  <  0  ) 
err_sys("Commit_process :  can't  msgget  message  queue  ") ; 

key  =  1; 

if  (  key  ==  1)    { 
Commit_process(comid) ; 

} 
exit(O) ; 


Commit .process (comid) 

int  comid; 

{ 

int  n,nl,i,  rank_member,  ml,rankpend; 

char  cl,    temp [MAXMESGD ATA] ,host_address [9] ,pending_member [9]  ; 

char  my_address [9] ,   pending_status[lO] ; 
int  tkpid, st id, gvid, commit _pend_process,   new_token; 

/* 

*  open  the  message  queues  for  groupview  server,  tokenpool  server 

*  and  status  table  server. 
*/ 

if  (  (  tkpid=  msgget (TKPSER,  0  ))  <  0  ) 

err_sys("Commit_process :  can't  msgget  tokenpool  server  queue  ") ; 
if  (  (  gvid=  msgget (GVSER,  0  ))  <  0  ) 

err_sys("Commit_process :  can't  msgget  groupview  server  queue  "); 
if  (  (  tkpid=  msgget (STSER,  0  ))  <  0  ) 

err_sys("Commit_process :  can't  msgget  stat  table  server  queue"); 
/*   if  (  (  imjoinid=  msgget (INJOINSER,  0  ))  <  0  ) 

*  err  sys("Commit  process:  can't  msgget  initiate  join  queue"); 
/* 

*  wait  for  commit  initiate  message. 
*/ 

new_token  =  TRUE; 

commit_pend_process  =  FALSE; 

commit _mesg.mesg_type  =  1; 

n=  commit_mesg_recv(comid,  &commit_mesg) ; 

if  (n  <  0  ) 

err_sys(  "data  read  error"); 
while  (new_token  I  commit_pend_process)  { 

commit_pend_process  =  FALSE; 
new_token  =  FALSE; 
/* 
*  send  a  query  to  token  pool  server  to  see  if  it  a  old  token 
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*  fill  in  appropriate  details  for  tokenpool  server  can 

*  receive  the  message 
*/ 

nl  =  getpidQ  ; 

/*  fill  in  the  process  id  field  in  the  message    */ 

itoa(tkpmesg.proc_id  ,  nl,  5); 

/*  fill  in  the  message  header  and  numbers  field  in  the  message  */ 

strcpy(tkpmesg.msg_header ,  "tokenstat") ; 
strcpy (tkpmesg.num_mem,  "01"); 

/*  fill  in  the  token  details  in  the  message    */ 

strcpy (tkpmesg . mesg_dat a [0] . t oken_type , 
commit _mesg . mesg_data . t oken_type) ; 
strcpy (tkpmesg. mesg_dat a [0] . initiator.address , 
commit _mesg.mesg_data. initiator_address) ; 
strcpy (tkpmesg . mesg_dat a [0] . member_address , 
commit _mesg.mesg_dat a. member. address)  ; 

tkpmesg. mesg_len=(28  +  17); 
tkpmesg. mesg_type=  1L; 

tkp_mesg_send(tkpid,  &tkpmesg) ; 

/*  receive  reply  from  tokenpool  server  */ 

tkpmesg.  mesg_type  =  nl°/,10000; 

n=  tkp_mesg_recv(tkpid,  &tkpmesg) ; 

if  (n  <  0  ) 

err_sys(  "data  read  error") ; 

if  (strcmp (tkpmesg. msg.header,  "notpresnt"))  { 

/* 

*  send  update  tokenpool  message  to  tokenpool  server. 
*/ 

st rcpy (tkpmesg. msg.header,  "uptknpool") ; 
tkp_mesg_send(tkpid,  &tkpmesg) ; 

/* 

*  send  update  status  table  message  to  statustable  server. 
*/ 

strcpy (stmesg.msg_header,  "removmemb") ; 
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/*  fill  in  the  process  id  field  in  the  message    */ 

itoa(stmesg.proc_id,  nl,  5); 

/*  fill  in  the  other  fields  in  the  message    */ 

strcpy(stmesg.num_mem,  "01"); 
strcpy(stmesg.mesg_data[0] .member .address, 
commit _mesg.mesg_data.member_address)  ; 

strcpy(stmesg.mesg_data[0] .member_status,  "removmemb") ; 
stmesg.mesg_len=(l9  +  17); 
stmesg.mesg_type=  1L; 

st_mesg_send(stid,  &stmesg) ; 

/* 

*  send  update  groupview  message  to  groupview  server. 
*/ 

if(  strcmp(commit_mesg.mesg_data.token_type, "joincomit")  ==  0) 

st rcpy (gvmesg . msg_header ,  "Upadgview" ) ; 
else 

st rcpy (gvmesg . msg_header ,  "Uprmgview" ) ; 

/*  fill  in  the  process  id  field  in  the  message    */ 

itoa(gvmesg.proc_id,  nl,  5); 

/*  fill  in  the  other  fields  in  the  message    */ 

st rcpy (gvmesg. num_mem,  "01"); 
strcpy(stmesg.mesg_data[0] .member.address , 
commit _mesg.mesg_dat a. member .address) ; 
st rcpy (gvmesg. view_num,  "0000") ; 
gvmesg. mesg_len=( 12  +  22  )  ; 
gvmesg. mesg_type=  1L; 

gview_mesg_send(gvid,  &gvmesg) ; 

/* 

*  update  the  host  address  of  the  group. 
*/ 

strcpy (gvmesg. msg_header,  "hostaddrs") ; 
gview_mesg_send(gvid,  &gvmesg) ; 

gvmesg.  mesg.type  =  nl'/.lOOOO; 

n  =  gview_mesg_recv(gvid,  &gvmesg) ; 

if  (n  <  0) 

err_sys("data  read  error"); 
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strcpy (host_address,  gvmesg.mesg_data[0] .member_address) ; 

/*  the  way  to  send  tokens  and  updating  the  anticlockwise  member 

*  is  pending  still 
*/ 

/*  check  to  see  if  the  running  process  is  the  host   */ 

/*        if((  strcmp(commit_mesg .mesg_data. token.type, "joincomit") ) 

*  (  strcmp(host_address,  my_address)) )   { 
* 

*  /* 

*  *  fill  initiate  join  message  with  appropriate 

*  *  values . 

*  */ 

/*  strcpy(injoin_mesg.msg_header, "sendinitp") ; 

*  injoin_mesg.mesg_type  =   1L; 

*  strcpy (injoin_mesg.member_address , 

*  commit_mesg.mesg_data.member_address) ; 

*  mjoin_mesg.mesg_len  =   20; 

*  injoin_mesg_send(injoinid,   &injoin_mesg) ; 

*  } 
*/  /* 

*  send  check  commit  pending  message  to  statustable  server. 

*/ 

strcpy (stmesg. msg.header,  "checkpend") ; 

/*  fill  in  the  process  id  field  in  the  message    */ 

itoa(stmesg.proc_id  ,  nl,  5); 

/*  fill  in  the  other  fields  in  the  message    */ 

strcpy(stmesg.num_mem,  "01")  ; 

strcpy ( stmesg . mesg_dat a [0] . member .address , 

commit _mesg.mesg_dat a. member .address) ; 

strcpy(stmesg.mesg_data[0] .member_status,    "f ailpendg") ; 

stmesg. mesg_len=(l9  +   17); 

stmesg. mesg_type=  1L; 

st_mesg_send(stid,  &stmesg) ; 

stmesg.  mesg_type  =  nl'/.lOOOO; 
n=  st_mesg_recv(stid,  &stmesg) ; 
if  (n  <  0  ) 

err_sys(  "data  read  error"); 

if  (atoi (stmesg. num_mem)  !=  0)  { 
commit _pend_process  =  TRUE; 
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strcpy (pending_member , stmesg. mesg_dat a [0] .member .address) ; 

strcpy (pending. stat us ,stmesg.mesg_data[0] .member .status) ; 

strcpy (gvmesg.mesg_data[0] . member_address , 

pending.member) ; 

strcpy (gvmesg.msg_header ,"rankmembr") ; 

gvmesg.mesg.type  =  1L; 

gview_mesg_send(gvid,&gvmesg) ; 

gvmesg.mesg.type  =  nl'/.lOOOO; 

n=  gview_mesg_recv(gvid,  fegvmesg) ; 

if  (n  <  0  ) 

err_sys(  "data  read  error"); 
rankpend  =  atoi(gvmesg .num_mem) ; 


/*  send  a  message  to  get  the  all  agreement  tokens   */ 
strcpy(stmesg.msg_header,  "getmemadr") ; 

/*  fill  in  the  other  fields  in  the  message    */ 

strcpy (stmesg.num_mem,  "01"); 

strcpy (stmesg .mesg_data[0] . member.address , 

commit _mesg.mesg_data.member_address)  ; 

strcpy ( stmesg. mesg_dat a [0] . member.status, 

"f ailagree") ; 

stmesg .mesg_len=(19  +  17); 

stmesg. mesg_type=  1L; 

st_mesg_send(stid,  festmesg) ; 

/*  get  a  list  of  members  who  are  agreed  on  failing   */ 

stmesg.  mesg_type  =  nl'/.lOOOO; 
n=  st_mesg_recv(stid,  &stmesg) ; 
if  (n  <  0  ) 

err_sys(  "data  read  error"); 


/* 

*  this  segment  checks  if  there  are  other  agree 

*  for  failures  member  present . 
*/ 

if  (  ! ( strcmp (stmesg. num_mem, "00")  ==0  ))  { 
/* 

*  this  segment  gets  the  rank  of  all 

*  members  with  a  failure  agree. 
*/ 


110 


} 
} 


} 


for(  ml  =  1;  ml  <=  atoi(stmesg.num_mem) ;  ml++  ){ 
strcpy(gvmesg.mesg_data[ml] . 
member_address , stmesg . mesg_data [ml] . 
member.address) ; 

} 

strcpy (gvmesg . msg_header , "rankmembr" ) ; 
gvmesg.mesg_type  =  1L; 
gview_mesg_send(gvid,&gvmesg) ; 

gvmesg.  mesg_type  =  nl'/olOOOO; 
n=  gview_mesg_recv(gvid,  fegvmesg) ; 
if  (n  <  0  ) 
err_sys(  "data  read  error"); 

for(  ml  =  1;  ml  <=  atoi (stmesg. num_mem) ;  ml++  ){ 

rank_member=  atoi (gvmesg. mesg_data[ml] .member_rank) ; 

if(  rank_member  <  rankpend)  { 

commit_pend_process  =  FALSE; 

break; 
} 
} 


} 


if  (commit_pend_process)  { 

strcpy (commit _mesg .mesg_data.member_address , 

pending_member) ; 

strcpy (commit_mesg .mesg_data. mitiator.address , 

my_address) ; 

if  (  strcmp(pending_status ,  "joinpendg")  ==  0) 

strcpy (commit _mesg.mesg_dat a. token_type,"joincomit") ; 
else 

strcpy(commit_mesg.mesg_data.token_type,"f ailcomit") ; 


/* 

*  send  a  message  by  using  SysTEm  V  message  queues. 

*  The  mesg_len,  mesg_type  and  mesg_data  must  be  filled  in  by  the 

*  caller 
*/ 

commit_mesg_send(id,mesgptr) 
int    id; 
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COMMesg    *mesgptr; 
{ 


/* 

*  Send  the  message  -  the  type  followed  by  the  optional  data 
*/ 

if  (msgsnd(id, (char  *)  &(mesgptr->mesg_type) , 
mesgptr->mesg_len,  0)  !=  0) 
err_sys("msgsend  error"); 


/* 

*  receive  a  message  by  reading  on  a  file  descriptor. 

*  fill  in  the  mesg_len,  mesg_type  and  mesg_data  also 
*/ 


int 

commit_mesg_recv(id,  mesgptr) 

int    id; 

COMMesg    *mesgptr; 


int    n; 

n  =  msgrcv(id,   &(mesgptr->mesg_type) , MAXMESGDATA , 
mesgptr->mesg_type,  0); 

if  (  (  mesgptr->mesg_len  =  n  )  <  0) 

err_sys("msgrcv  error"); 
return(n) ; 
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